From patchwork Sun Apr 24 10:15:50 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lei Wang X-Patchwork-Id: 12824841 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 88373C43217 for ; Sun, 24 Apr 2022 10:16:10 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239025AbiDXKTE (ORCPT ); Sun, 24 Apr 2022 06:19:04 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59956 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238998AbiDXKTA (ORCPT ); Sun, 24 Apr 2022 06:19:00 -0400 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 04FD02A70E; Sun, 24 Apr 2022 03:15:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1650795360; x=1682331360; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=e1HVGt2MfbFAhIqpB5yM3pi/exzDn5R49eSSkwTGqR4=; b=ldOvGh1JoeHpyzGjKbU5FfQHvr2idu2De6YDxFHC+ttUivtLhLmVCevo mor9wdJee7/mSFkKj+UUROyqmN0jfzXuKOLaMsPPE35zmFYyTo40VHwxk aGcDP1tfxXzinsg1eRGj+vQ1pYFmEBFhatmCU9g38qKd6Oy2KoK16erWz 2EMA+CPD4E8AFtkWjhzTdSl3itgvnOLVsABdXPkq0Ff5DV2F4eXzKzLQF sNxiv5ynbJEgeO5flI32NUdGGp1HbmuoCFddJSDkXreg3h7FU0T0F29cx 3keybRtEd6XM/aOAaASd1jIF85H0PikgC/09SpgK1tgkP/xz8OEAosR4U w==; X-IronPort-AV: E=McAfee;i="6400,9594,10326"; a="264813942" X-IronPort-AV: E=Sophos;i="5.90,286,1643702400"; d="scan'208";a="264813942" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Apr 2022 03:15:58 -0700 X-IronPort-AV: E=Sophos;i="5.90,286,1643702400"; d="scan'208";a="616086707" Received: from 984fee00be24.jf.intel.com ([10.165.54.246]) by fmsmga008-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Apr 2022 03:15:58 -0700 From: Lei Wang To: pbonzini@redhat.com, seanjc@google.com, vkuznets@redhat.com, wanpengli@tencent.com, jmattson@google.com, joro@8bytes.org Cc: lei4.wang@intel.com, chenyi.qiang@intel.com, kvm@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH v7 1/8] KVM: VMX: Introduce PKS VMCS fields Date: Sun, 24 Apr 2022 03:15:50 -0700 Message-Id: <20220424101557.134102-2-lei4.wang@intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220424101557.134102-1-lei4.wang@intel.com> References: <20220424101557.134102-1-lei4.wang@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Chenyi Qiang PKS(Protection Keys for Supervisor Pages) is a feature that extends the Protection Key architecture to support thread-specific permission restrictions on supervisor pages. A new PKS MSR(PKRS) is defined in kernel to support PKS, which holds a set of permissions associated with each protection domain. Two VMCS fields {HOST,GUEST}_IA32_PKRS are introduced in {host,guest}-state area to store the respective values of PKRS. Every VM exit saves PKRS into guest-state area. If VM_EXIT_LOAD_IA32_PKRS = 1, VM exit loads PKRS from the host-state area. If VM_ENTRY_LOAD_IA32_PKRS = 1, VM entry loads PKRS from the guest-state area. Signed-off-by: Chenyi Qiang Reviewed-by: Jim Mattson Reviewed-by: Sean Christopherson --- arch/x86/include/asm/vmx.h | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h index 0ffaa3156a4e..7962d506ba91 100644 --- a/arch/x86/include/asm/vmx.h +++ b/arch/x86/include/asm/vmx.h @@ -95,6 +95,7 @@ #define VM_EXIT_CLEAR_BNDCFGS 0x00800000 #define VM_EXIT_PT_CONCEAL_PIP 0x01000000 #define VM_EXIT_CLEAR_IA32_RTIT_CTL 0x02000000 +#define VM_EXIT_LOAD_IA32_PKRS 0x20000000 #define VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR 0x00036dff @@ -108,6 +109,7 @@ #define VM_ENTRY_LOAD_BNDCFGS 0x00010000 #define VM_ENTRY_PT_CONCEAL_PIP 0x00020000 #define VM_ENTRY_LOAD_IA32_RTIT_CTL 0x00040000 +#define VM_ENTRY_LOAD_IA32_PKRS 0x00400000 #define VM_ENTRY_ALWAYSON_WITHOUT_TRUE_MSR 0x000011ff @@ -245,12 +247,16 @@ enum vmcs_field { GUEST_BNDCFGS_HIGH = 0x00002813, GUEST_IA32_RTIT_CTL = 0x00002814, GUEST_IA32_RTIT_CTL_HIGH = 0x00002815, + GUEST_IA32_PKRS = 0x00002818, + GUEST_IA32_PKRS_HIGH = 0x00002819, HOST_IA32_PAT = 0x00002c00, HOST_IA32_PAT_HIGH = 0x00002c01, HOST_IA32_EFER = 0x00002c02, HOST_IA32_EFER_HIGH = 0x00002c03, HOST_IA32_PERF_GLOBAL_CTRL = 0x00002c04, HOST_IA32_PERF_GLOBAL_CTRL_HIGH = 0x00002c05, + HOST_IA32_PKRS = 0x00002c06, + HOST_IA32_PKRS_HIGH = 0x00002c07, PIN_BASED_VM_EXEC_CONTROL = 0x00004000, CPU_BASED_VM_EXEC_CONTROL = 0x00004002, EXCEPTION_BITMAP = 0x00004004, From patchwork Sun Apr 24 10:15:51 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lei Wang X-Patchwork-Id: 12824842 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A0865C4332F for ; Sun, 24 Apr 2022 10:16:10 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239010AbiDXKTB (ORCPT ); Sun, 24 Apr 2022 06:19:01 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59958 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238997AbiDXKTA (ORCPT ); Sun, 24 Apr 2022 06:19:00 -0400 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 064B34ECDE; Sun, 24 Apr 2022 03:16:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1650795360; x=1682331360; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=jf0sdA0pfpDS78svVoTxE3uyUlT8Z1t9Z6i4qHugW48=; b=djZM3MvyJXW14BFboeb2S63+IP3St6NHVilkj5x1tTZsC2LYElA4UWpI VxEkzJjuON9oPuULohzBqrTz/fT2N1c7MDZJ52NOe6XPcjBdLX9mlBgNs NO7cCByzbFPVy2CtwYNqd/K2bv0kPFP2Or/PjWtkW7jAad5asjCcRcyif b51+ggYTFJgO64zv/nad5GHDv0XaBRdNDAai8gn+04kOZo0hemNPIexez dG/LcfpmbNV6vImupNv83kwlg1ShnxpEaVJcpTwTt1ObDFQdR5Bajyivg gc/dQ/4sQobn4qcAy68dO6ARvS+lt+pjl1rvmCNkHQxrHFyB3H/jrhvUd w==; X-IronPort-AV: E=McAfee;i="6400,9594,10326"; a="264813943" X-IronPort-AV: E=Sophos;i="5.90,286,1643702400"; d="scan'208";a="264813943" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Apr 2022 03:15:59 -0700 X-IronPort-AV: E=Sophos;i="5.90,286,1643702400"; d="scan'208";a="616086710" Received: from 984fee00be24.jf.intel.com ([10.165.54.246]) by fmsmga008-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Apr 2022 03:15:58 -0700 From: Lei Wang To: pbonzini@redhat.com, seanjc@google.com, vkuznets@redhat.com, wanpengli@tencent.com, jmattson@google.com, joro@8bytes.org Cc: lei4.wang@intel.com, chenyi.qiang@intel.com, kvm@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH v7 2/8] KVM: VMX: Add proper cache tracking for PKRS Date: Sun, 24 Apr 2022 03:15:51 -0700 Message-Id: <20220424101557.134102-3-lei4.wang@intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220424101557.134102-1-lei4.wang@intel.com> References: <20220424101557.134102-1-lei4.wang@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Chenyi Qiang Add PKRS caching into the standard register caching mechanism in order to take advantage of the availability checks provided by regs_avail. This is because vcpu->arch.pkrs will be rarely acceesed by KVM, only in the case of host userspace MSR reads and GVA->GPA translation in following patches. It is unnecessary to keep it up-to-date at all times. It also should be noted that the potential benefits of this caching are tenuous because the MSR read is not a hot path. it's nice-to-have so that we don't hesitate to rip it out in the future if there's a strong reason to drop the caching. Signed-off-by: Chenyi Qiang Co-developed-by: Lei Wang Signed-off-by: Lei Wang --- arch/x86/include/asm/kvm_host.h | 2 ++ arch/x86/kvm/kvm_cache_regs.h | 7 +++++++ arch/x86/kvm/vmx/vmx.c | 11 +++++++++++ arch/x86/kvm/vmx/vmx.h | 3 ++- 4 files changed, 22 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index e0c0f0e1f754..f5455bada8cd 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -180,6 +180,7 @@ enum kvm_reg { VCPU_EXREG_SEGMENTS, VCPU_EXREG_EXIT_INFO_1, VCPU_EXREG_EXIT_INFO_2, + VCPU_EXREG_PKRS, }; enum { @@ -638,6 +639,7 @@ struct kvm_vcpu_arch { unsigned long cr8; u32 host_pkru; u32 pkru; + u32 pkrs; u32 hflags; u64 efer; u64 apic_base; diff --git a/arch/x86/kvm/kvm_cache_regs.h b/arch/x86/kvm/kvm_cache_regs.h index 3febc342360c..2b2540ca584f 100644 --- a/arch/x86/kvm/kvm_cache_regs.h +++ b/arch/x86/kvm/kvm_cache_regs.h @@ -177,6 +177,13 @@ static inline u64 kvm_read_edx_eax(struct kvm_vcpu *vcpu) | ((u64)(kvm_rdx_read(vcpu) & -1u) << 32); } +static inline u32 kvm_read_pkrs(struct kvm_vcpu *vcpu) +{ + if (!kvm_register_is_available(vcpu, VCPU_EXREG_PKRS)) + static_call(kvm_x86_cache_reg)(vcpu, VCPU_EXREG_PKRS); + return vcpu->arch.pkrs; +} + static inline void enter_guest_mode(struct kvm_vcpu *vcpu) { vcpu->arch.hflags |= HF_GUEST_MASK; diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 04d170c4b61e..395b2deb76aa 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -2258,6 +2258,7 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) static void vmx_cache_reg(struct kvm_vcpu *vcpu, enum kvm_reg reg) { unsigned long guest_owned_bits; + u64 ia32_pkrs; kvm_register_mark_available(vcpu, reg); @@ -2292,6 +2293,16 @@ static void vmx_cache_reg(struct kvm_vcpu *vcpu, enum kvm_reg reg) vcpu->arch.cr4 &= ~guest_owned_bits; vcpu->arch.cr4 |= vmcs_readl(GUEST_CR4) & guest_owned_bits; break; + case VCPU_EXREG_PKRS: + /* + * The high 32 bits of PKRS are reserved and attempting to write + * non-zero value will cause #GP. KVM intentionally drops those + * bits. + */ + ia32_pkrs = vmcs_read64(GUEST_IA32_PKRS); + WARN_ON_ONCE(ia32_pkrs >> 32); + vcpu->arch.pkrs = ia32_pkrs; + break; default: KVM_BUG_ON(1, vcpu->kvm); break; diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h index 9c6bfcd84008..661df9584b12 100644 --- a/arch/x86/kvm/vmx/vmx.h +++ b/arch/x86/kvm/vmx/vmx.h @@ -499,7 +499,8 @@ BUILD_CONTROLS_SHADOW(secondary_exec, SECONDARY_VM_EXEC_CONTROL) (1 << VCPU_EXREG_CR3) | \ (1 << VCPU_EXREG_CR4) | \ (1 << VCPU_EXREG_EXIT_INFO_1) | \ - (1 << VCPU_EXREG_EXIT_INFO_2)) + (1 << VCPU_EXREG_EXIT_INFO_2) | \ + (1 << VCPU_EXREG_PKRS)) static inline struct kvm_vmx *to_kvm_vmx(struct kvm *kvm) { From patchwork Sun Apr 24 10:15:52 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lei Wang X-Patchwork-Id: 12824843 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B238AC43219 for ; Sun, 24 Apr 2022 10:16:10 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239037AbiDXKTI (ORCPT ); Sun, 24 Apr 2022 06:19:08 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60018 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239006AbiDXKTB (ORCPT ); Sun, 24 Apr 2022 06:19:01 -0400 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 25580140A5; Sun, 24 Apr 2022 03:16:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1650795361; x=1682331361; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=ay5OMiAFRmre8MjfAZWNoQ/Mn5N9l/Abp3RB2vmddBI=; b=HE85WumgARlpkn5UQXFPx3BokJ4bUbDSaf9iMkWuyfuDTUuybZn8c3/e wygPR7/MOZ4x5XQ80RWb66dTTcF1zLj5dYxBM2fx7yh2PwGPNl4alibbv I+EJIXVbQY1HKDgBrsGAt8iRMbLrDIbvKIVow2c2Kv0K2qdsWNSBOHg6c LxN94Pq2x6EeOixrJr6LrqUOjiB7zLD2zc/+zB2DJne+Xx2/NXn5KtKIT dYwTBc6iooewZWMMdE3wEmYP6dWvjmORAEbj6qS54PA+KDfaGXKiO30nF iVlKu7B71M+KgPAPbFDghCw7/OJ/j+5XZsAbjMGp9/NU9vE0ZJcCRJv38 Q==; X-IronPort-AV: E=McAfee;i="6400,9594,10326"; a="264813944" X-IronPort-AV: E=Sophos;i="5.90,286,1643702400"; d="scan'208";a="264813944" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Apr 2022 03:15:59 -0700 X-IronPort-AV: E=Sophos;i="5.90,286,1643702400"; d="scan'208";a="616086713" Received: from 984fee00be24.jf.intel.com ([10.165.54.246]) by fmsmga008-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Apr 2022 03:15:59 -0700 From: Lei Wang To: pbonzini@redhat.com, seanjc@google.com, vkuznets@redhat.com, wanpengli@tencent.com, jmattson@google.com, joro@8bytes.org Cc: lei4.wang@intel.com, chenyi.qiang@intel.com, kvm@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH v7 3/8] KVM: X86: Expose IA32_PKRS MSR Date: Sun, 24 Apr 2022 03:15:52 -0700 Message-Id: <20220424101557.134102-4-lei4.wang@intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220424101557.134102-1-lei4.wang@intel.com> References: <20220424101557.134102-1-lei4.wang@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Chenyi Qiang Protection Key for Superviosr Pages (PKS) uses IA32_PKRS MSR (PKRS) at index 0x6E1 to allow software to manage superviosr key rights, i.e. it can enforce additional permissions checks besides normal paging protections via a MSR update without TLB flushes when permissions change. For performance consideration, PKRS intercept in KVM will be disabled when PKS is supported in guest so that PKRS can be accessed without VM exit. PKS introduces dedicated control fields in VMCS to switch PKRS, which only does the retore part. In addition, every VM exit saves PKRS into the guest-state area in VMCS, while VM enter won't save the host value due to the expectation that the host won't change the MSR often. Update the host's value in VMCS manually if the MSR has been changed by the kernel since the last time the VMCS was run. Introduce a function get_current_pkrs() in arch/x86/mm/pkeys.c to export the per-cpu variable pkrs_cache to avoid frequent rdmsr of PKRS. Signed-off-by: Chenyi Qiang Co-developed-by: Lei Wang Signed-off-by: Lei Wang --- arch/x86/kvm/vmx/vmcs.h | 1 + arch/x86/kvm/vmx/vmx.c | 63 +++++++++++++++++++++++++++++++++++++---- arch/x86/kvm/vmx/vmx.h | 9 +++++- arch/x86/kvm/x86.c | 9 +++++- arch/x86/kvm/x86.h | 6 ++++ arch/x86/mm/pkeys.c | 6 ++++ include/linux/pks.h | 7 +++++ 7 files changed, 94 insertions(+), 7 deletions(-) diff --git a/arch/x86/kvm/vmx/vmcs.h b/arch/x86/kvm/vmx/vmcs.h index e325c290a816..ee37741b2b9d 100644 --- a/arch/x86/kvm/vmx/vmcs.h +++ b/arch/x86/kvm/vmx/vmcs.h @@ -42,6 +42,7 @@ struct vmcs_host_state { #ifdef CONFIG_X86_64 u16 ds_sel, es_sel; #endif + u32 pkrs; }; struct vmcs_controls_shadow { diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 395b2deb76aa..9d0588e85410 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -28,6 +28,7 @@ #include #include #include +#include #include #include @@ -172,6 +173,7 @@ static u32 vmx_possible_passthrough_msrs[MAX_POSSIBLE_PASSTHROUGH_MSRS] = { MSR_CORE_C3_RESIDENCY, MSR_CORE_C6_RESIDENCY, MSR_CORE_C7_RESIDENCY, + MSR_IA32_PKRS, }; /* @@ -1111,6 +1113,7 @@ void vmx_prepare_switch_to_guest(struct kvm_vcpu *vcpu) #endif unsigned long fs_base, gs_base; u16 fs_sel, gs_sel; + u32 host_pkrs; int i; vmx->req_immediate_exit = false; @@ -1146,6 +1149,17 @@ void vmx_prepare_switch_to_guest(struct kvm_vcpu *vcpu) */ host_state->ldt_sel = kvm_read_ldt(); + /* + * Update the host pkrs vmcs field before vcpu runs. + * The setting of VM_EXIT_LOAD_IA32_PKRS can ensure + * kvm_cpu_cap_has(X86_FEATURE_PKS) && + * guest_cpuid_has(vcpu, X86_FEATURE_PKS) + */ + if (vm_exit_controls_get(vmx) & VM_EXIT_LOAD_IA32_PKRS) { + host_pkrs = get_current_pkrs(); + vmx_set_host_pkrs(host_state, host_pkrs); + } + #ifdef CONFIG_X86_64 savesegment(ds, host_state->ds_sel); savesegment(es, host_state->es_sel); @@ -1901,6 +1915,13 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) case MSR_IA32_DEBUGCTLMSR: msr_info->data = vmcs_read64(GUEST_IA32_DEBUGCTL); break; + case MSR_IA32_PKRS: + if (!kvm_cpu_cap_has(X86_FEATURE_PKS) || + (!msr_info->host_initiated && + !guest_cpuid_has(vcpu, X86_FEATURE_PKS))) + return 1; + msr_info->data = kvm_read_pkrs(vcpu); + break; default: find_uret_msr: msr = vmx_find_uret_msr(vmx, msr_info->index); @@ -2242,7 +2263,17 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) } ret = kvm_set_msr_common(vcpu, msr_info); break; - + case MSR_IA32_PKRS: + if (!kvm_pkrs_valid(data)) + return 1; + if (!kvm_cpu_cap_has(X86_FEATURE_PKS) || + (!msr_info->host_initiated && + !guest_cpuid_has(vcpu, X86_FEATURE_PKS))) + return 1; + vcpu->arch.pkrs = data; + kvm_register_mark_available(vcpu, VCPU_EXREG_PKRS); + vmcs_write64(GUEST_IA32_PKRS, data); + break; default: find_uret_msr: msr = vmx_find_uret_msr(vmx, msr_index); @@ -2533,7 +2564,8 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf, VM_EXIT_LOAD_IA32_EFER | VM_EXIT_CLEAR_BNDCFGS | VM_EXIT_PT_CONCEAL_PIP | - VM_EXIT_CLEAR_IA32_RTIT_CTL; + VM_EXIT_CLEAR_IA32_RTIT_CTL | + VM_EXIT_LOAD_IA32_PKRS; if (adjust_vmx_controls(min, opt, MSR_IA32_VMX_EXIT_CTLS, &_vmexit_control) < 0) return -EIO; @@ -2557,7 +2589,8 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf, VM_ENTRY_LOAD_IA32_EFER | VM_ENTRY_LOAD_BNDCFGS | VM_ENTRY_PT_CONCEAL_PIP | - VM_ENTRY_LOAD_IA32_RTIT_CTL; + VM_ENTRY_LOAD_IA32_RTIT_CTL | + VM_ENTRY_LOAD_IA32_PKRS; if (adjust_vmx_controls(min, opt, MSR_IA32_VMX_ENTRY_CTLS, &_vmentry_control) < 0) return -EIO; @@ -4166,7 +4199,8 @@ static u32 vmx_vmentry_ctrl(void) VM_ENTRY_LOAD_IA32_RTIT_CTL); /* Loading of EFER and PERF_GLOBAL_CTRL are toggled dynamically */ return vmentry_ctrl & - ~(VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL | VM_ENTRY_LOAD_IA32_EFER); + ~(VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL | VM_ENTRY_LOAD_IA32_EFER | + VM_ENTRY_LOAD_IA32_PKRS); } static u32 vmx_vmexit_ctrl(void) @@ -4178,7 +4212,8 @@ static u32 vmx_vmexit_ctrl(void) VM_EXIT_CLEAR_IA32_RTIT_CTL); /* Loading of EFER and PERF_GLOBAL_CTRL are toggled dynamically */ return vmexit_ctrl & - ~(VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL | VM_EXIT_LOAD_IA32_EFER); + ~(VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL | VM_EXIT_LOAD_IA32_EFER | + VM_EXIT_LOAD_IA32_PKRS); } static void vmx_refresh_apicv_exec_ctrl(struct kvm_vcpu *vcpu) @@ -5923,6 +5958,8 @@ void dump_vmcs(struct kvm_vcpu *vcpu) vmcs_read64(GUEST_IA32_PERF_GLOBAL_CTRL)); if (vmentry_ctl & VM_ENTRY_LOAD_BNDCFGS) pr_err("BndCfgS = 0x%016llx\n", vmcs_read64(GUEST_BNDCFGS)); + if (vmentry_ctl & VM_ENTRY_LOAD_IA32_PKRS) + pr_err("PKRS = 0x%016llx\n", vmcs_read64(GUEST_IA32_PKRS)); pr_err("Interruptibility = %08x ActivityState = %08x\n", vmcs_read32(GUEST_INTERRUPTIBILITY_INFO), vmcs_read32(GUEST_ACTIVITY_STATE)); @@ -5964,6 +6001,8 @@ void dump_vmcs(struct kvm_vcpu *vcpu) vmcs_read64(HOST_IA32_PERF_GLOBAL_CTRL)); if (vmcs_read32(VM_EXIT_MSR_LOAD_COUNT) > 0) vmx_dump_msrs("host autoload", &vmx->msr_autoload.host); + if (vmexit_ctl & VM_EXIT_LOAD_IA32_PKRS) + pr_err("PKRS = 0x%016llx\n", vmcs_read64(HOST_IA32_PKRS)); pr_err("*** Control State ***\n"); pr_err("PinBased=%08x CPUBased=%08x SecondaryExec=%08x\n", @@ -7406,6 +7445,20 @@ static void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu) /* Refresh #PF interception to account for MAXPHYADDR changes. */ vmx_update_exception_bitmap(vcpu); + + if (kvm_cpu_cap_has(X86_FEATURE_PKS)) { + if (guest_cpuid_has(vcpu, X86_FEATURE_PKS)) { + vmx_disable_intercept_for_msr(vcpu, MSR_IA32_PKRS, MSR_TYPE_RW); + + vm_entry_controls_setbit(vmx, VM_ENTRY_LOAD_IA32_PKRS); + vm_exit_controls_setbit(vmx, VM_EXIT_LOAD_IA32_PKRS); + } else { + vmx_enable_intercept_for_msr(vcpu, MSR_IA32_PKRS, MSR_TYPE_RW); + + vm_entry_controls_clearbit(vmx, VM_ENTRY_LOAD_IA32_PKRS); + vm_exit_controls_clearbit(vmx, VM_EXIT_LOAD_IA32_PKRS); + } + } } static __init void vmx_set_cpu_caps(void) diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h index 661df9584b12..91723a226bf3 100644 --- a/arch/x86/kvm/vmx/vmx.h +++ b/arch/x86/kvm/vmx/vmx.h @@ -352,7 +352,7 @@ struct vcpu_vmx { struct lbr_desc lbr_desc; /* Save desired MSR intercept (read: pass-through) state */ -#define MAX_POSSIBLE_PASSTHROUGH_MSRS 15 +#define MAX_POSSIBLE_PASSTHROUGH_MSRS 16 struct { DECLARE_BITMAP(read, MAX_POSSIBLE_PASSTHROUGH_MSRS); DECLARE_BITMAP(write, MAX_POSSIBLE_PASSTHROUGH_MSRS); @@ -580,4 +580,11 @@ static inline int vmx_get_instr_info_reg2(u32 vmx_instr_info) return (vmx_instr_info >> 28) & 0xf; } +static inline void vmx_set_host_pkrs(struct vmcs_host_state *host, u32 pkrs){ + if (unlikely(pkrs != host->pkrs)) { + vmcs_write64(HOST_IA32_PKRS, pkrs); + host->pkrs = pkrs; + } +} + #endif /* __KVM_X86_VMX_H */ diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 547ba00ef64f..d784bf3a4b3e 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -1396,7 +1396,7 @@ static const u32 msrs_to_save_all[] = { MSR_IA32_RTIT_ADDR1_A, MSR_IA32_RTIT_ADDR1_B, MSR_IA32_RTIT_ADDR2_A, MSR_IA32_RTIT_ADDR2_B, MSR_IA32_RTIT_ADDR3_A, MSR_IA32_RTIT_ADDR3_B, - MSR_IA32_UMWAIT_CONTROL, + MSR_IA32_UMWAIT_CONTROL, MSR_IA32_PKRS, MSR_ARCH_PERFMON_FIXED_CTR0, MSR_ARCH_PERFMON_FIXED_CTR1, MSR_ARCH_PERFMON_FIXED_CTR0 + 2, @@ -6638,6 +6638,10 @@ static void kvm_init_msr_list(void) intel_pt_validate_hw_cap(PT_CAP_num_address_ranges) * 2) continue; break; + case MSR_IA32_PKRS: + if (!kvm_cpu_cap_has(X86_FEATURE_PKS)) + continue; + break; case MSR_ARCH_PERFMON_PERFCTR0 ... MSR_ARCH_PERFMON_PERFCTR0 + 17: if (msrs_to_save_all[i] - MSR_ARCH_PERFMON_PERFCTR0 >= min(INTEL_PMC_MAX_GENERIC, x86_pmu.num_counters_gp)) @@ -11410,6 +11414,9 @@ void kvm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event) kvm_set_rflags(vcpu, X86_EFLAGS_FIXED); kvm_rip_write(vcpu, 0xfff0); + if (!init_event && kvm_cpu_cap_has(X86_FEATURE_PKS)) + __kvm_set_msr(vcpu, MSR_IA32_PKRS, 0, true); + vcpu->arch.cr3 = 0; kvm_register_mark_dirty(vcpu, VCPU_EXREG_CR3); diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h index 588792f00334..7610f0d40b0f 100644 --- a/arch/x86/kvm/x86.h +++ b/arch/x86/kvm/x86.h @@ -404,6 +404,12 @@ static inline void kvm_machine_check(void) #endif } +static inline bool kvm_pkrs_valid(u64 data) +{ + /* bit[63,32] must be zero */ + return !(data >> 32); +} + void kvm_load_guest_xsave_state(struct kvm_vcpu *vcpu); void kvm_load_host_xsave_state(struct kvm_vcpu *vcpu); int kvm_spec_ctrl_test_value(u64 value); diff --git a/arch/x86/mm/pkeys.c b/arch/x86/mm/pkeys.c index 74ba51b9853b..bd75af62b685 100644 --- a/arch/x86/mm/pkeys.c +++ b/arch/x86/mm/pkeys.c @@ -495,4 +495,10 @@ void pks_update_exception(struct pt_regs *regs, u8 pkey, u8 protection) } EXPORT_SYMBOL_GPL(pks_update_exception); +u32 get_current_pkrs(void) +{ + return this_cpu_read(pkrs_cache); +} +EXPORT_SYMBOL_GPL(get_current_pkrs); + #endif /* CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS */ diff --git a/include/linux/pks.h b/include/linux/pks.h index ce8eea81f208..0a71f8f4055d 100644 --- a/include/linux/pks.h +++ b/include/linux/pks.h @@ -53,6 +53,8 @@ static inline void pks_set_readwrite(u8 pkey) typedef bool (*pks_key_callback)(struct pt_regs *regs, unsigned long address, bool write); +u32 get_current_pkrs(void); + #else /* !CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS */ static inline bool pks_available(void) @@ -68,6 +70,11 @@ static inline void pks_update_exception(struct pt_regs *regs, u8 protection) { } +static inline u32 get_current_pkrs(void) +{ + return 0; +} + #endif /* CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS */ #ifdef CONFIG_PKS_TEST From patchwork Sun Apr 24 10:15:53 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lei Wang X-Patchwork-Id: 12824845 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 38AD9C433F5 for ; Sun, 24 Apr 2022 10:16:17 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239071AbiDXKTM (ORCPT ); Sun, 24 Apr 2022 06:19:12 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60020 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239007AbiDXKTB (ORCPT ); Sun, 24 Apr 2022 06:19:01 -0400 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 58CA04ECE6; Sun, 24 Apr 2022 03:16:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1650795361; x=1682331361; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=AG6I0vol4fbSkjehs1prqA3p+B+p4WZSw+ippY+gYLU=; b=gBIesfDM3iaYbl7kM0oLAsrPky4/cNUVpkBunSZ6AD+kvLwwZZ/JJNrb mHue4qXiuLivxkx0Axb1lIhMX7R90zGGEse9Kay7uUCD+gikvHXxx7Glx 5a7ffAN9elL+5e7GCwbT3po7Brc5AWcnGvSLMg6AsaS/Y/CjKrLb3m1B7 cKSOG6iAlBIZ5AwABGxIVT9trciZ/StebuCuIZJlZspt0/BfrQa2X5rFd dkmZOThmuRt5E0WdwQNBPTwnOje5Kj3hCV8vFtpJ2cGjwr7p0mmym9LHO Lt1UnnUkKEdqLqsyDL8HlHlmSaVdJ8o0BqjXv3g+hXgw/xy0Xlw+hbop4 w==; X-IronPort-AV: E=McAfee;i="6400,9594,10326"; a="264813945" X-IronPort-AV: E=Sophos;i="5.90,286,1643702400"; d="scan'208";a="264813945" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Apr 2022 03:15:59 -0700 X-IronPort-AV: E=Sophos;i="5.90,286,1643702400"; d="scan'208";a="616086716" Received: from 984fee00be24.jf.intel.com ([10.165.54.246]) by fmsmga008-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Apr 2022 03:15:59 -0700 From: Lei Wang To: pbonzini@redhat.com, seanjc@google.com, vkuznets@redhat.com, wanpengli@tencent.com, jmattson@google.com, joro@8bytes.org Cc: lei4.wang@intel.com, chenyi.qiang@intel.com, kvm@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH v7 4/8] KVM: MMU: Rename the pkru to pkr Date: Sun, 24 Apr 2022 03:15:53 -0700 Message-Id: <20220424101557.134102-5-lei4.wang@intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220424101557.134102-1-lei4.wang@intel.com> References: <20220424101557.134102-1-lei4.wang@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Chenyi Qiang PKRU represents the PKU register utilized in the protection key rights check for user pages. Protection Keys for Superviosr Pages (PKS) extends the protection key architecture to cover supervisor pages. Rename the *pkru* related variables and functions to *pkr* which stands for both of the PKRU and PKRS. It makes sense because PKS and PKU each have: - a single control register (PKRU and PKRS) - the same number of keys (16 in total) - the same format in control registers (Access and Write disable bits) PKS and PKU can also share the same bitmap pkr_mask cache conditions where protection key checks are needed, because they can share almost the same requirements for PK restrictions to cause a fault, except they focus on different pages (supervisor and user pages). Reviewed-by: Paolo Bonzini Signed-off-by: Chenyi Qiang --- arch/x86/include/asm/kvm_host.h | 2 +- arch/x86/kvm/mmu.h | 12 ++++++------ arch/x86/kvm/mmu/mmu.c | 10 +++++----- 3 files changed, 12 insertions(+), 12 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index f5455bada8cd..1014d6a2b069 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -459,7 +459,7 @@ struct kvm_mmu { * with PFEC.RSVD replaced by ACC_USER_MASK from the page tables. * Each domain has 2 bits which are ANDed with AD and WD from PKRU. */ - u32 pkru_mask; + u32 pkr_mask; u64 *pae_root; u64 *pml4_root; diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h index e6cae6f22683..cb3f07e63778 100644 --- a/arch/x86/kvm/mmu.h +++ b/arch/x86/kvm/mmu.h @@ -239,8 +239,8 @@ static inline u8 permission_fault(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu, u32 errcode = PFERR_PRESENT_MASK; WARN_ON(pfec & (PFERR_PK_MASK | PFERR_RSVD_MASK)); - if (unlikely(mmu->pkru_mask)) { - u32 pkru_bits, offset; + if (unlikely(mmu->pkr_mask)) { + u32 pkr_bits, offset; /* * PKRU defines 32 bits, there are 16 domains and 2 @@ -248,15 +248,15 @@ static inline u8 permission_fault(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu, * index of the protection domain, so pte_pkey * 2 is * is the index of the first bit for the domain. */ - pkru_bits = (vcpu->arch.pkru >> (pte_pkey * 2)) & 3; + pkr_bits = (vcpu->arch.pkru >> (pte_pkey * 2)) & 3; /* clear present bit, replace PFEC.RSVD with ACC_USER_MASK. */ offset = (pfec & ~1) + ((pte_access & PT_USER_MASK) << (PFERR_RSVD_BIT - PT_USER_SHIFT)); - pkru_bits &= mmu->pkru_mask >> offset; - errcode |= -pkru_bits & PFERR_PK_MASK; - fault |= (pkru_bits != 0); + pkr_bits &= mmu->pkr_mask >> offset; + errcode |= -pkr_bits & PFERR_PK_MASK; + fault |= (pkr_bits != 0); } return -(u32)fault & errcode; diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index f9080ee50ffa..de665361548d 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -4631,12 +4631,12 @@ static void update_permission_bitmask(struct kvm_mmu *mmu, bool ept) * away both AD and WD. For all reads or if the last condition holds, WD * only will be masked away. */ -static void update_pkru_bitmask(struct kvm_mmu *mmu) +static void update_pkr_bitmask(struct kvm_mmu *mmu) { unsigned bit; bool wp; - mmu->pkru_mask = 0; + mmu->pkr_mask = 0; if (!is_cr4_pke(mmu)) return; @@ -4671,7 +4671,7 @@ static void update_pkru_bitmask(struct kvm_mmu *mmu) /* PKRU.WD stops write access. */ pkey_bits |= (!!check_write) << 1; - mmu->pkru_mask |= (pkey_bits & 3) << pfec; + mmu->pkr_mask |= (pkey_bits & 3) << pfec; } } @@ -4683,7 +4683,7 @@ static void reset_guest_paging_metadata(struct kvm_vcpu *vcpu, reset_rsvds_bits_mask(vcpu, mmu); update_permission_bitmask(mmu, false); - update_pkru_bitmask(mmu); + update_pkr_bitmask(mmu); } static void paging64_init_context(struct kvm_mmu *context) @@ -4951,7 +4951,7 @@ void kvm_init_shadow_ept_mmu(struct kvm_vcpu *vcpu, bool execonly, context->root_level = level; context->direct_map = false; update_permission_bitmask(context, true); - context->pkru_mask = 0; + context->pkr_mask = 0; reset_rsvds_bits_mask_ept(vcpu, context, execonly, huge_page_level); reset_ept_shadow_zero_bits_mask(context, execonly); } From patchwork Sun Apr 24 10:15:54 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lei Wang X-Patchwork-Id: 12824844 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B0E58C433FE for ; Sun, 24 Apr 2022 10:16:14 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239049AbiDXKTL (ORCPT ); Sun, 24 Apr 2022 06:19:11 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60062 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239011AbiDXKTB (ORCPT ); Sun, 24 Apr 2022 06:19:01 -0400 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8D4B54ECDE; Sun, 24 Apr 2022 03:16:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1650795361; x=1682331361; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=qz/GE1fawZvUOeTVkWMItT1dq54iN+MwN7o2/FhMXlA=; b=Sy+X/Iy9JpUaylKCSrQ884Vr+vfmIvLy48Ex9JN2BBIjaWxxJR+VVHw8 qq9pcTJOMKv9n5pZSk+WMQ+BRrhIXfSQDirMAUn48+hO5bEVDf9ieL6iU bUr9u+KbC+9NcOwajModtGE+jM7bya4iRRwVnkcfBv1wPiKELkm2qFA6E 7W4UXdk0mBaO5Vi2xv5YFZqc3OGuXoW0r6/T5TvK5gxBdBLDrjsWg+Ovs nOLeHlOr4+6VB9CUGH3O3HIU69NzCt3mhHMhNGhKmxCMs3cyY1Rrnm50O 3h+xR8HJJLZoFXs1JVfWeCADk+KM3M5jzHOEs08ymIf6zlO4hqyH6IEVn A==; X-IronPort-AV: E=McAfee;i="6400,9594,10326"; a="264813946" X-IronPort-AV: E=Sophos;i="5.90,286,1643702400"; d="scan'208";a="264813946" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Apr 2022 03:16:00 -0700 X-IronPort-AV: E=Sophos;i="5.90,286,1643702400"; d="scan'208";a="616086719" Received: from 984fee00be24.jf.intel.com ([10.165.54.246]) by fmsmga008-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Apr 2022 03:15:59 -0700 From: Lei Wang To: pbonzini@redhat.com, seanjc@google.com, vkuznets@redhat.com, wanpengli@tencent.com, jmattson@google.com, joro@8bytes.org Cc: lei4.wang@intel.com, chenyi.qiang@intel.com, kvm@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH v7 5/8] KVM: MMU: Add helper function to get pkr bits Date: Sun, 24 Apr 2022 03:15:54 -0700 Message-Id: <20220424101557.134102-6-lei4.wang@intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220424101557.134102-1-lei4.wang@intel.com> References: <20220424101557.134102-1-lei4.wang@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Extra the PKR stuff to a separate, non-inline helper, which is a preparation to introduce pks support. Signed-off-by: Lei Wang --- arch/x86/kvm/mmu.h | 20 +++++--------------- arch/x86/kvm/mmu/mmu.c | 21 +++++++++++++++++++++ 2 files changed, 26 insertions(+), 15 deletions(-) diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h index cb3f07e63778..cea03053a153 100644 --- a/arch/x86/kvm/mmu.h +++ b/arch/x86/kvm/mmu.h @@ -204,6 +204,9 @@ static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, return vcpu->arch.mmu->page_fault(vcpu, &fault); } +u32 kvm_mmu_pkr_bits(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu, + unsigned pte_access, unsigned pte_pkey, unsigned int pfec); + /* * Check if a given access (described through the I/D, W/R and U/S bits of a * page fault error code pfec) causes a permission fault with the given PTE @@ -240,21 +243,8 @@ static inline u8 permission_fault(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu, WARN_ON(pfec & (PFERR_PK_MASK | PFERR_RSVD_MASK)); if (unlikely(mmu->pkr_mask)) { - u32 pkr_bits, offset; - - /* - * PKRU defines 32 bits, there are 16 domains and 2 - * attribute bits per domain in pkru. pte_pkey is the - * index of the protection domain, so pte_pkey * 2 is - * is the index of the first bit for the domain. - */ - pkr_bits = (vcpu->arch.pkru >> (pte_pkey * 2)) & 3; - - /* clear present bit, replace PFEC.RSVD with ACC_USER_MASK. */ - offset = (pfec & ~1) + - ((pte_access & PT_USER_MASK) << (PFERR_RSVD_BIT - PT_USER_SHIFT)); - - pkr_bits &= mmu->pkr_mask >> offset; + u32 pkr_bits = + kvm_mmu_pkr_bits(vcpu, mmu, pte_access, pte_pkey, pfec); errcode |= -pkr_bits & PFERR_PK_MASK; fault |= (pkr_bits != 0); } diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index de665361548d..6d3276986102 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -6477,3 +6477,24 @@ void kvm_mmu_pre_destroy_vm(struct kvm *kvm) if (kvm->arch.nx_lpage_recovery_thread) kthread_stop(kvm->arch.nx_lpage_recovery_thread); } + +u32 kvm_mmu_pkr_bits(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu, + unsigned pte_access, unsigned pte_pkey, unsigned int pfec) +{ + u32 pkr_bits, offset; + + /* + * PKRU defines 32 bits, there are 16 domains and 2 + * attribute bits per domain in pkru. pte_pkey is the + * index of the protection domain, so pte_pkey * 2 is + * is the index of the first bit for the domain. + */ + pkr_bits = (vcpu->arch.pkru >> (pte_pkey * 2)) & 3; + + /* clear present bit, replace PFEC.RSVD with ACC_USER_MASK. */ + offset = (pfec & ~1) + ((pte_access & PT_USER_MASK) + << (PFERR_RSVD_BIT - PT_USER_SHIFT)); + + pkr_bits &= mmu->pkr_mask >> offset; + return pkr_bits; +} From patchwork Sun Apr 24 10:15:55 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lei Wang X-Patchwork-Id: 12824847 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 66E4AC433F5 for ; Sun, 24 Apr 2022 10:16:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239080AbiDXKTO (ORCPT ); Sun, 24 Apr 2022 06:19:14 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60152 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239015AbiDXKTC (ORCPT ); Sun, 24 Apr 2022 06:19:02 -0400 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4210F2A70E; Sun, 24 Apr 2022 03:16:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1650795362; x=1682331362; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=5kTtTfmZGRocjo6hhmjT7Q1879f4hSENEQ9niFqYT6k=; b=IpCmVBGm0eAFnDxsGE/nzhRTR2Q2Y6yqqbZoyEQNtIhWCmUU1K3gkHLY T16kvSnYW+jlt8hlDJRvGy2EYKPoDqJx/HFwQJvAKfXeU5kbu1I02Cymr 8W0psFr/MOhAVFDX7mRSA9s7rdbvSoFzOK3DL4ofp5S3c7bCrnmpQxDsI PSUdwzUKDI5IN7tO1XZfaYTGiAM/VbPqRLN+/GQMpIdhO0X4Urg/yk2FU 1UrzPLH+tMQ+cIHd8wJL69SlNJJKi+qhN5xTRRSLbITpkkLTVTTtMYG0n Yv0upJYPbwCKdlIQp5WAN/DbuPVrB/lHcU3lgjrY7vHzYxmNzoAwdulVO Q==; X-IronPort-AV: E=McAfee;i="6400,9594,10326"; a="264813947" X-IronPort-AV: E=Sophos;i="5.90,286,1643702400"; d="scan'208";a="264813947" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Apr 2022 03:16:00 -0700 X-IronPort-AV: E=Sophos;i="5.90,286,1643702400"; d="scan'208";a="616086722" Received: from 984fee00be24.jf.intel.com ([10.165.54.246]) by fmsmga008-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Apr 2022 03:16:00 -0700 From: Lei Wang To: pbonzini@redhat.com, seanjc@google.com, vkuznets@redhat.com, wanpengli@tencent.com, jmattson@google.com, joro@8bytes.org Cc: lei4.wang@intel.com, chenyi.qiang@intel.com, kvm@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH v7 6/8] KVM: MMU: Add support for PKS emulation Date: Sun, 24 Apr 2022 03:15:55 -0700 Message-Id: <20220424101557.134102-7-lei4.wang@intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220424101557.134102-1-lei4.wang@intel.com> References: <20220424101557.134102-1-lei4.wang@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Chenyi Qiang Up until now, pkr_mask had 0 bits for supervisor pages (the U/S bit in page tables replaces the PFEC.RSVD in page fault error code). For PKS support, fill in the bits using the same algorithm used for user mode pages, but with CR4.PKE replaced by CR4.PKS. Because of this change, CR4.PKS must also be included in the MMU role. Signed-off-by: Chenyi Qiang Co-developed-by: Lei Wang Signed-off-by: Lei Wang --- arch/x86/include/asm/kvm_host.h | 10 +-- arch/x86/kvm/mmu.h | 3 +- arch/x86/kvm/mmu/mmu.c | 109 +++++++++++++++++++++----------- 3 files changed, 80 insertions(+), 42 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 1014d6a2b069..a245d9817f72 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -375,6 +375,7 @@ union kvm_mmu_extended_role { unsigned int cr4_smap:1; unsigned int cr4_smep:1; unsigned int cr4_la57:1; + unsigned int cr4_pks:1; unsigned int efer_lma:1; }; }; @@ -454,10 +455,11 @@ struct kvm_mmu { u8 permissions[16]; /* - * The pkru_mask indicates if protection key checks are needed. It - * consists of 16 domains indexed by page fault error code bits [4:1], - * with PFEC.RSVD replaced by ACC_USER_MASK from the page tables. - * Each domain has 2 bits which are ANDed with AD and WD from PKRU. + * The pkr_mask indicates if protection key checks are needed. + * It consists of 16 domains indexed by page fault error code + * bits[4:1] with PFEC.RSVD replaced by ACC_USER_MASK from the + * page tables. Each domain has 2 bits which are ANDed with AD + * and WD from PKRU/PKRS. */ u32 pkr_mask; diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h index cea03053a153..6963c641e6ce 100644 --- a/arch/x86/kvm/mmu.h +++ b/arch/x86/kvm/mmu.h @@ -45,7 +45,8 @@ #define PT32E_ROOT_LEVEL 3 #define KVM_MMU_CR4_ROLE_BITS (X86_CR4_PSE | X86_CR4_PAE | X86_CR4_LA57 | \ - X86_CR4_SMEP | X86_CR4_SMAP | X86_CR4_PKE) + X86_CR4_SMEP | X86_CR4_SMAP | X86_CR4_PKE | \ + X86_CR4_PKS) #define KVM_MMU_CR0_ROLE_BITS (X86_CR0_PG | X86_CR0_WP) #define KVM_MMU_EFER_ROLE_BITS (EFER_LME | EFER_NX) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 6d3276986102..a6cbc22d3312 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -209,6 +209,7 @@ BUILD_MMU_ROLE_REGS_ACCESSOR(cr4, smep, X86_CR4_SMEP); BUILD_MMU_ROLE_REGS_ACCESSOR(cr4, smap, X86_CR4_SMAP); BUILD_MMU_ROLE_REGS_ACCESSOR(cr4, pke, X86_CR4_PKE); BUILD_MMU_ROLE_REGS_ACCESSOR(cr4, la57, X86_CR4_LA57); +BUILD_MMU_ROLE_REGS_ACCESSOR(cr4, pks, X86_CR4_PKS); BUILD_MMU_ROLE_REGS_ACCESSOR(efer, nx, EFER_NX); BUILD_MMU_ROLE_REGS_ACCESSOR(efer, lma, EFER_LMA); @@ -231,6 +232,7 @@ BUILD_MMU_ROLE_ACCESSOR(ext, cr4, smep); BUILD_MMU_ROLE_ACCESSOR(ext, cr4, smap); BUILD_MMU_ROLE_ACCESSOR(ext, cr4, pke); BUILD_MMU_ROLE_ACCESSOR(ext, cr4, la57); +BUILD_MMU_ROLE_ACCESSOR(ext, cr4, pks); BUILD_MMU_ROLE_ACCESSOR(base, efer, nx); static struct kvm_mmu_role_regs vcpu_to_role_regs(struct kvm_vcpu *vcpu) @@ -4608,37 +4610,58 @@ static void update_permission_bitmask(struct kvm_mmu *mmu, bool ept) } /* -* PKU is an additional mechanism by which the paging controls access to -* user-mode addresses based on the value in the PKRU register. Protection -* key violations are reported through a bit in the page fault error code. -* Unlike other bits of the error code, the PK bit is not known at the -* call site of e.g. gva_to_gpa; it must be computed directly in -* permission_fault based on two bits of PKRU, on some machine state (CR4, -* CR0, EFER, CPL), and on other bits of the error code and the page tables. -* -* In particular the following conditions come from the error code, the -* page tables and the machine state: -* - PK is always zero unless CR4.PKE=1 and EFER.LMA=1 -* - PK is always zero if RSVD=1 (reserved bit set) or F=1 (instruction fetch) -* - PK is always zero if U=0 in the page tables -* - PKRU.WD is ignored if CR0.WP=0 and the access is a supervisor access. -* -* The PKRU bitmask caches the result of these four conditions. The error -* code (minus the P bit) and the page table's U bit form an index into the -* PKRU bitmask. Two bits of the PKRU bitmask are then extracted and ANDed -* with the two bits of the PKRU register corresponding to the protection key. -* For the first three conditions above the bits will be 00, thus masking -* away both AD and WD. For all reads or if the last condition holds, WD -* only will be masked away. -*/ + * Protection Key Rights (PKR) is an additional mechanism by which data accesses + * with 4-level or 5-level paging (EFER.LMA=1) may be disabled based on the + * Protection Key Rights Userspace (PRKU) or Protection Key Rights Supervisor + * (PKRS) registers. The Protection Key (PK) used for an access is a 4-bit + * value specified in bits 62:59 of the leaf PTE used to translate the address. + * + * PKRU and PKRS are 32-bit registers, with 16 2-bit entries consisting of an + * access-disable (AD) and write-disable (WD) bit. The PK from the leaf PTE is + * used to index the approriate PKR (see below), e.g. PK=1 would consume bits + * 3:2 (bit 3 == write-disable, bit 2 == access-disable). + * + * The PK register (PKRU vs. PKRS) indexed by the PK depends on the type of + * _address_ (not access type!). For a user-mode address, PKRU is used; for a + * supervisor-mode address, PKRS is used. An address is supervisor-mode if the + * U/S flag (bit 2) is 0 in at least one of the paging-structure entries, i.e. + * an address is user-mode if the U/S flag is 1 in _all_ entries. Again, this + * is the address type, not the the access type, e.g. a supervisor-mode _access_ + * will consume PKRU if the _address_ is a user-mode address. + * + * As alluded to above, PKR checks are only performed for data accesses; code + * fetches are not subject to PKR checks. Terminal page faults (!PRESENT or + * PFEC.RSVD=1) are also not subject to PKR checks. + * + * PKR write-disable checks for superivsor-mode _accesses_ are performed if and + * only if CR0.WP=1 (though access-disable checks still apply). + * + * In summary, PKR checks are based on (a) EFER.LMA, (b) CR4.PKE or CR4.PKS, + * (c) CR0.WP, (d) the PK in the leaf PTE, (e) two bits from the corresponding + * PKR{S,U} entry, (f) the access type (derived from the other PFEC bits), and + * (g) the address type (retrieved from the paging-structure entries). + * + * To avoid conditional branches in permission_fault(), the PKR bitmask caches + * the above inputs, except for (e) the PKR{S,U} entry. The FETCH, USER, and + * WRITE bits of the PFEC and the effective value of the paging-structures' U/S + * bit (slotted into the PFEC.RSVD position, bit 3) are used to index into the + * PKR bitmask (similar to the 4-bit Protection Key itself). The two bits of + * the PKR bitmask "entry" are then extracted and ANDed with the two bits of + * the PKR{S,U} register corresponding to the address type and protection key. + * + * E.g. for all values where PFEC.FETCH=1, the corresponding pkr_bitmask bits + * will be 00b, thus masking away the AD and WD bits from the PKR{S,U} register + * to suppress PKR checks on code fetches. + */ static void update_pkr_bitmask(struct kvm_mmu *mmu) { unsigned bit; bool wp; - + bool cr4_pke = is_cr4_pke(mmu); + bool cr4_pks = is_cr4_pks(mmu); mmu->pkr_mask = 0; - if (!is_cr4_pke(mmu)) + if (!cr4_pke && !cr4_pks) return; wp = is_cr0_wp(mmu); @@ -4656,19 +4679,22 @@ static void update_pkr_bitmask(struct kvm_mmu *mmu) pte_user = pfec & PFERR_RSVD_MASK; /* - * Only need to check the access which is not an - * instruction fetch and is to a user page. + * need to check the access which is not an + * instruction fetch and + * - if cr4_pke 1-setting when accessing a user page. + * - if cr4_pks 1-setting when accessing a supervisor page. */ - check_pkey = (!ff && pte_user); + check_pkey = !ff && (pte_user ? cr4_pke : cr4_pks); + /* - * write access is controlled by PKRU if it is a - * user access or CR0.WP = 1. + * write access is controlled by PKRU/PKRS if + * it is a user access or CR0.WP = 1. */ check_write = check_pkey && wf && (uf || wp); - /* PKRU.AD stops both read and write access. */ + /* PKRU/PKRS.AD stops both read and write access. */ pkey_bits = !!check_pkey; - /* PKRU.WD stops write access. */ + /* PKRU/PKRS.WD stops write access. */ pkey_bits |= (!!check_write) << 1; mmu->pkr_mask |= (pkey_bits & 3) << pfec; @@ -4719,6 +4745,7 @@ static union kvm_mmu_extended_role kvm_calc_mmu_role_ext(struct kvm_vcpu *vcpu, /* PKEY and LA57 are active iff long mode is active. */ ext.cr4_pke = ____is_efer_lma(regs) && ____is_cr4_pke(regs); ext.cr4_la57 = ____is_efer_lma(regs) && ____is_cr4_la57(regs); + ext.cr4_pks = ____is_efer_lma(regs) && ____is_cr4_pks(regs); ext.efer_lma = ____is_efer_lma(regs); } @@ -6482,14 +6509,22 @@ u32 kvm_mmu_pkr_bits(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu, unsigned pte_access, unsigned pte_pkey, unsigned int pfec) { u32 pkr_bits, offset; + u32 pkr; /* - * PKRU defines 32 bits, there are 16 domains and 2 - * attribute bits per domain in pkru. pte_pkey is the - * index of the protection domain, so pte_pkey * 2 is - * is the index of the first bit for the domain. + * PKRU and PKRS both define 32 bits. There are 16 domains + * and 2 attribute bits per domain in them. pte_key is the + * index of the protection domain, so pte_pkey * 2 is the + * index of the first bit for the domain. The use of PKRU + * versus PKRS is selected by the address type, as determined + * by the U/S bit in the paging-structure entries. */ - pkr_bits = (vcpu->arch.pkru >> (pte_pkey * 2)) & 3; + if (pte_access & PT_USER_MASK) + pkr = is_cr4_pke(mmu) ? vcpu->arch.pkru : 0; + else + pkr = is_cr4_pks(mmu) ? kvm_read_pkrs(vcpu) : 0; + + pkr_bits = (pkr >> pte_pkey * 2) & 3; /* clear present bit, replace PFEC.RSVD with ACC_USER_MASK. */ offset = (pfec & ~1) + ((pte_access & PT_USER_MASK) From patchwork Sun Apr 24 10:15:56 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lei Wang X-Patchwork-Id: 12824846 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 50E80C433FE for ; Sun, 24 Apr 2022 10:16:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239062AbiDXKTQ (ORCPT ); Sun, 24 Apr 2022 06:19:16 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60110 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239012AbiDXKTC (ORCPT ); Sun, 24 Apr 2022 06:19:02 -0400 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 470DC4ECEA; Sun, 24 Apr 2022 03:16:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1650795362; x=1682331362; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=xk/86zUzUAfWuppfgsBbZJ/BJPyRckuC0Nw4Y+Xr0lM=; b=VQkPAklW3PF/cYFvUx9sgKsFwguZVScmimCm8JjvH22s/4Htf+XFJUUT ooi+NZynrQQMCfwKX/4PvlrfHKV5RqF0Nhjz8h8pHt28vYCsGOYHSYSJT xibu4cmNdfC4AstMSFHS5kNQBbyf3Q0cMmY2TEQhM5+CnKRFZpfVBJB94 nU+fblq86FTQFEbYhXgLp2zYwWRGcbSUljOdR8feSsuDGLodbnTKCYz/u UB4fYcHu5yxmVEGHB+ExbgsDvEzhQEJosiKkeD3EUah++QOeXLrMUnIV+ CZuE1QHQUx0qUi7yxd0ZTxho6JL83AE+33G8IHyEJJmWpNMjO9+WB2/fa g==; X-IronPort-AV: E=McAfee;i="6400,9594,10326"; a="264813948" X-IronPort-AV: E=Sophos;i="5.90,286,1643702400"; d="scan'208";a="264813948" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Apr 2022 03:16:01 -0700 X-IronPort-AV: E=Sophos;i="5.90,286,1643702400"; d="scan'208";a="616086726" Received: from 984fee00be24.jf.intel.com ([10.165.54.246]) by fmsmga008-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Apr 2022 03:16:00 -0700 From: Lei Wang To: pbonzini@redhat.com, seanjc@google.com, vkuznets@redhat.com, wanpengli@tencent.com, jmattson@google.com, joro@8bytes.org Cc: lei4.wang@intel.com, chenyi.qiang@intel.com, kvm@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH v7 7/8] KVM: VMX: Expose PKS to guest Date: Sun, 24 Apr 2022 03:15:56 -0700 Message-Id: <20220424101557.134102-8-lei4.wang@intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220424101557.134102-1-lei4.wang@intel.com> References: <20220424101557.134102-1-lei4.wang@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Chenyi Qiang Existence of PKS is enumerated via CPUID.(EAX=7,ECX=0):ECX[31]. It is enabled by setting CR4.PKS when long mode is active. PKS is only implemented when EPT is enabled and requires the support of VM_{ENTRY,EXIT}_LOAD_IA32_PKRS VMCS controls currently. Signed-off-by: Chenyi Qiang Co-developed-by: Lei Wang Signed-off-by: Lei Wang --- arch/x86/include/asm/kvm_host.h | 3 ++- arch/x86/kvm/cpuid.c | 13 +++++++++---- arch/x86/kvm/vmx/capabilities.h | 6 ++++++ arch/x86/kvm/vmx/vmx.c | 10 +++++++--- arch/x86/kvm/x86.h | 2 ++ 5 files changed, 26 insertions(+), 8 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index a245d9817f72..6f78ed784661 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -117,7 +117,8 @@ | X86_CR4_PGE | X86_CR4_PCE | X86_CR4_OSFXSR | X86_CR4_PCIDE \ | X86_CR4_OSXSAVE | X86_CR4_SMEP | X86_CR4_FSGSBASE \ | X86_CR4_OSXMMEXCPT | X86_CR4_LA57 | X86_CR4_VMXE \ - | X86_CR4_SMAP | X86_CR4_PKE | X86_CR4_UMIP)) + | X86_CR4_SMAP | X86_CR4_PKE | X86_CR4_UMIP \ + | X86_CR4_PKS)) #define CR8_RESERVED_BITS (~(unsigned long)X86_CR8_TPR) diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index b24ca7f4ed7c..f419bdd7f6af 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -570,18 +570,23 @@ void kvm_set_cpu_caps(void) F(AVX512_VPOPCNTDQ) | F(UMIP) | F(AVX512_VBMI2) | F(GFNI) | F(VAES) | F(VPCLMULQDQ) | F(AVX512_VNNI) | F(AVX512_BITALG) | F(CLDEMOTE) | F(MOVDIRI) | F(MOVDIR64B) | 0 /*WAITPKG*/ | - F(SGX_LC) | F(BUS_LOCK_DETECT) + F(SGX_LC) | F(BUS_LOCK_DETECT) | F(PKS) ); /* Set LA57 based on hardware capability. */ if (cpuid_ecx(7) & F(LA57)) kvm_cpu_cap_set(X86_FEATURE_LA57); /* - * PKU not yet implemented for shadow paging and requires OSPKE - * to be set on the host. Clear it if that is not the case + * Protection Keys are not supported for shadow paging. PKU further + * requires OSPKE to be set on the host in order to use {RD,WR}PKRU to + * save/restore the guests PKRU. */ - if (!tdp_enabled || !boot_cpu_has(X86_FEATURE_OSPKE)) + if (!tdp_enabled) { kvm_cpu_cap_clear(X86_FEATURE_PKU); + kvm_cpu_cap_clear(X86_FEATURE_PKS); + } else if (!boot_cpu_has(X86_FEATURE_OSPKE)) { + kvm_cpu_cap_clear(X86_FEATURE_PKU); + } kvm_cpu_cap_mask(CPUID_7_EDX, F(AVX512_4VNNIW) | F(AVX512_4FMAPS) | F(SPEC_CTRL) | diff --git a/arch/x86/kvm/vmx/capabilities.h b/arch/x86/kvm/vmx/capabilities.h index 3f430e218375..cc9c23ab85fd 100644 --- a/arch/x86/kvm/vmx/capabilities.h +++ b/arch/x86/kvm/vmx/capabilities.h @@ -104,6 +104,12 @@ static inline bool cpu_has_load_perf_global_ctrl(void) (vmcs_config.vmexit_ctrl & VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL); } +static inline bool cpu_has_load_ia32_pkrs(void) +{ + return (vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_IA32_PKRS) && + (vmcs_config.vmexit_ctrl & VM_EXIT_LOAD_IA32_PKRS); +} + static inline bool cpu_has_vmx_mpx(void) { return (vmcs_config.vmexit_ctrl & VM_EXIT_CLEAR_BNDCFGS) && diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 9d0588e85410..cbcb0d7b47a4 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -3250,7 +3250,7 @@ void vmx_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4) } /* - * SMEP/SMAP/PKU is disabled if CPU is in non-paging mode in + * SMEP/SMAP/PKU/PKS is disabled if CPU is in non-paging mode in * hardware. To emulate this behavior, SMEP/SMAP/PKU needs * to be manually disabled when guest switches to non-paging * mode. @@ -3258,10 +3258,11 @@ void vmx_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4) * If !enable_unrestricted_guest, the CPU is always running * with CR0.PG=1 and CR4 needs to be modified. * If enable_unrestricted_guest, the CPU automatically - * disables SMEP/SMAP/PKU when the guest sets CR0.PG=0. + * disables SMEP/SMAP/PKU/PKS when the guest sets CR0.PG=0. */ if (!is_paging(vcpu)) - hw_cr4 &= ~(X86_CR4_SMEP | X86_CR4_SMAP | X86_CR4_PKE); + hw_cr4 &= ~(X86_CR4_SMEP | X86_CR4_SMAP | X86_CR4_PKE | + X86_CR4_PKS); } vmcs_writel(CR4_READ_SHADOW, cr4); @@ -7500,6 +7501,9 @@ static __init void vmx_set_cpu_caps(void) if (cpu_has_vmx_waitpkg()) kvm_cpu_cap_check_and_set(X86_FEATURE_WAITPKG); + + if (cpu_has_load_ia32_pkrs()) + kvm_cpu_cap_check_and_set(X86_FEATURE_PKS); } static void vmx_request_immediate_exit(struct kvm_vcpu *vcpu) diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h index 7610f0d40b0f..997b85a20962 100644 --- a/arch/x86/kvm/x86.h +++ b/arch/x86/kvm/x86.h @@ -449,6 +449,8 @@ bool kvm_msr_allowed(struct kvm_vcpu *vcpu, u32 index, u32 type); __reserved_bits |= X86_CR4_VMXE; \ if (!__cpu_has(__c, X86_FEATURE_PCID)) \ __reserved_bits |= X86_CR4_PCIDE; \ + if (!__cpu_has(__c, X86_FEATURE_PKS)) \ + __reserved_bits |= X86_CR4_PKS; \ __reserved_bits; \ }) From patchwork Sun Apr 24 10:15:57 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lei Wang X-Patchwork-Id: 12824848 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D76F2C433EF for ; Sun, 24 Apr 2022 10:16:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239036AbiDXKTX (ORCPT ); Sun, 24 Apr 2022 06:19:23 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60154 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239016AbiDXKTC (ORCPT ); Sun, 24 Apr 2022 06:19:02 -0400 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id ABBD3140A5; Sun, 24 Apr 2022 03:16:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1650795362; x=1682331362; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=y7Qa/RTMtrNvcVvyMDQP5HKRtGLkqkD84LwqpgUU1FI=; b=ZSGblX3uuTLwUKJI8o/iBRSXkFLcoMsTA/He0O60tCwrza35evwEiqcC IKU0VHJNfbajIj9rE+1pXC9O+goV/zlmtCPqG1x4KI4zPutZ/UWbeAWza zoFLywI9JV3weAV4lUf/5RWYM+C8qNQpmuwyRcw8HP2ukY1oWvG5bVWuE xWp6C/ufCsM+7hyVz/w7Pbr/Xl7uj9bOVobDaV9UHXofldb7/qvatz3v/ Xm9JIyijAAYB6s9iXPe+LVpkZeRouFA7VQqVb2TqMnCQ3zpni9gVFPTkf K1BxKZPZqQ3r5xVyA220zFbPWonowbYCRszdyV9FD7SDzd4Lk3I8eCwQo Q==; X-IronPort-AV: E=McAfee;i="6400,9594,10326"; a="264813949" X-IronPort-AV: E=Sophos;i="5.90,286,1643702400"; d="scan'208";a="264813949" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Apr 2022 03:16:01 -0700 X-IronPort-AV: E=Sophos;i="5.90,286,1643702400"; d="scan'208";a="616086730" Received: from 984fee00be24.jf.intel.com ([10.165.54.246]) by fmsmga008-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Apr 2022 03:16:01 -0700 From: Lei Wang To: pbonzini@redhat.com, seanjc@google.com, vkuznets@redhat.com, wanpengli@tencent.com, jmattson@google.com, joro@8bytes.org Cc: lei4.wang@intel.com, chenyi.qiang@intel.com, kvm@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH v7 8/8] KVM: VMX: Enable PKS for nested VM Date: Sun, 24 Apr 2022 03:15:57 -0700 Message-Id: <20220424101557.134102-9-lei4.wang@intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220424101557.134102-1-lei4.wang@intel.com> References: <20220424101557.134102-1-lei4.wang@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Chenyi Qiang PKS MSR passes through guest directly. Configure the MSR to match the L0/L1 settings so that nested VM runs PKS properly. Signed-off-by: Chenyi Qiang Co-developed-by: Lei Wang Signed-off-by: Lei Wang --- arch/x86/kvm/vmx/nested.c | 36 ++++++++++++++++++++++++++++++++++-- arch/x86/kvm/vmx/vmcs12.c | 2 ++ arch/x86/kvm/vmx/vmcs12.h | 4 ++++ arch/x86/kvm/vmx/vmx.c | 1 + arch/x86/kvm/vmx/vmx.h | 2 ++ 5 files changed, 43 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c index 58a1fa7defc9..dde359dacfcb 100644 --- a/arch/x86/kvm/vmx/nested.c +++ b/arch/x86/kvm/vmx/nested.c @@ -252,6 +252,7 @@ static void vmx_sync_vmcs_host_state(struct vcpu_vmx *vmx, dest->ds_sel = src->ds_sel; dest->es_sel = src->es_sel; #endif + vmx_set_host_pkrs(dest, src->pkrs); } static void vmx_switch_vmcs(struct kvm_vcpu *vcpu, struct loaded_vmcs *vmcs) @@ -685,6 +686,9 @@ static inline bool nested_vmx_prepare_msr_bitmap(struct kvm_vcpu *vcpu, nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0, MSR_IA32_PRED_CMD, MSR_TYPE_W); + nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0, + MSR_IA32_PKRS, MSR_TYPE_RW); + kvm_vcpu_unmap(vcpu, &vmx->nested.msr_bitmap_map, false); vmx->nested.force_msr_bitmap_recalc = false; @@ -2433,6 +2437,10 @@ static void prepare_vmcs02_rare(struct vcpu_vmx *vmx, struct vmcs12 *vmcs12) if (kvm_mpx_supported() && vmx->nested.nested_run_pending && (vmcs12->vm_entry_controls & VM_ENTRY_LOAD_BNDCFGS)) vmcs_write64(GUEST_BNDCFGS, vmcs12->guest_bndcfgs); + + if (vmx->nested.nested_run_pending && + (vmcs12->vm_entry_controls & VM_ENTRY_LOAD_IA32_PKRS)) + vmcs_write64(GUEST_IA32_PKRS, vmcs12->guest_ia32_pkrs); } if (nested_cpu_has_xsaves(vmcs12)) @@ -2521,6 +2529,11 @@ static int prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12, if (kvm_mpx_supported() && (!vmx->nested.nested_run_pending || !(vmcs12->vm_entry_controls & VM_ENTRY_LOAD_BNDCFGS))) vmcs_write64(GUEST_BNDCFGS, vmx->nested.vmcs01_guest_bndcfgs); + if (kvm_cpu_cap_has(X86_FEATURE_PKS) && + (!vmx->nested.nested_run_pending || + !(vmcs12->vm_entry_controls & VM_ENTRY_LOAD_IA32_PKRS))) + vmcs_write64(GUEST_IA32_PKRS, vmx->nested.vmcs01_guest_pkrs); + vmx_set_rflags(vcpu, vmcs12->guest_rflags); /* EXCEPTION_BITMAP and CR0_GUEST_HOST_MASK should basically be the @@ -2897,6 +2910,10 @@ static int nested_vmx_check_host_state(struct kvm_vcpu *vcpu, vmcs12->host_ia32_perf_global_ctrl))) return -EINVAL; + if ((vmcs12->vm_exit_controls & VM_EXIT_LOAD_IA32_PKRS) && + CC(!kvm_pkrs_valid(vmcs12->host_ia32_pkrs))) + return -EINVAL; + #ifdef CONFIG_X86_64 ia32e = !!(vmcs12->vm_exit_controls & VM_EXIT_HOST_ADDR_SPACE_SIZE); #else @@ -3049,6 +3066,10 @@ static int nested_vmx_check_guest_state(struct kvm_vcpu *vcpu, if (nested_check_guest_non_reg_state(vmcs12)) return -EINVAL; + if ((vmcs12->vm_entry_controls & VM_ENTRY_LOAD_IA32_PKRS) && + CC(!kvm_pkrs_valid(vmcs12->guest_ia32_pkrs))) + return -EINVAL; + return 0; } @@ -3384,6 +3405,10 @@ enum nvmx_vmentry_status nested_vmx_enter_non_root_mode(struct kvm_vcpu *vcpu, (!from_vmentry || !(vmcs12->vm_entry_controls & VM_ENTRY_LOAD_BNDCFGS))) vmx->nested.vmcs01_guest_bndcfgs = vmcs_read64(GUEST_BNDCFGS); + if (kvm_cpu_cap_has(X86_FEATURE_PKS) && + (!from_vmentry || + !(vmcs12->vm_entry_controls & VM_ENTRY_LOAD_IA32_PKRS))) + vmx->nested.vmcs01_guest_pkrs = vmcs_read64(GUEST_IA32_PKRS); /* * Overwrite vmcs01.GUEST_CR3 with L1's CR3 if EPT is disabled *and* @@ -4029,6 +4054,7 @@ static bool is_vmcs12_ext_field(unsigned long field) case GUEST_IDTR_BASE: case GUEST_PENDING_DBG_EXCEPTIONS: case GUEST_BNDCFGS: + case GUEST_IA32_PKRS: return true; default: break; @@ -4080,6 +4106,8 @@ static void sync_vmcs02_to_vmcs12_rare(struct kvm_vcpu *vcpu, vmcs_readl(GUEST_PENDING_DBG_EXCEPTIONS); if (kvm_mpx_supported()) vmcs12->guest_bndcfgs = vmcs_read64(GUEST_BNDCFGS); + if (vmx->nested.msrs.entry_ctls_high & VM_ENTRY_LOAD_IA32_PKRS) + vmcs12->guest_ia32_pkrs = vmcs_read64(GUEST_IA32_PKRS); vmx->nested.need_sync_vmcs02_to_vmcs12_rare = false; } @@ -4317,6 +4345,9 @@ static void load_vmcs12_host_state(struct kvm_vcpu *vcpu, WARN_ON_ONCE(kvm_set_msr(vcpu, MSR_CORE_PERF_GLOBAL_CTRL, vmcs12->host_ia32_perf_global_ctrl)); + if (vmcs12->vm_exit_controls & VM_EXIT_LOAD_IA32_PKRS) + vmcs_write64(GUEST_IA32_PKRS, vmcs12->host_ia32_pkrs); + /* Set L1 segment info according to Intel SDM 27.5.2 Loading Host Segment and Descriptor-Table Registers */ seg = (struct kvm_segment) { @@ -6559,7 +6590,8 @@ void nested_vmx_setup_ctls_msrs(struct nested_vmx_msrs *msrs, u32 ept_caps) VM_EXIT_HOST_ADDR_SPACE_SIZE | #endif VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT | - VM_EXIT_CLEAR_BNDCFGS | VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL; + VM_EXIT_CLEAR_BNDCFGS | VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL | + VM_EXIT_LOAD_IA32_PKRS; msrs->exit_ctls_high |= VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR | VM_EXIT_LOAD_IA32_EFER | VM_EXIT_SAVE_IA32_EFER | @@ -6579,7 +6611,7 @@ void nested_vmx_setup_ctls_msrs(struct nested_vmx_msrs *msrs, u32 ept_caps) VM_ENTRY_IA32E_MODE | #endif VM_ENTRY_LOAD_IA32_PAT | VM_ENTRY_LOAD_BNDCFGS | - VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL; + VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL | VM_ENTRY_LOAD_IA32_PKRS; msrs->entry_ctls_high |= (VM_ENTRY_ALWAYSON_WITHOUT_TRUE_MSR | VM_ENTRY_LOAD_IA32_EFER); diff --git a/arch/x86/kvm/vmx/vmcs12.c b/arch/x86/kvm/vmx/vmcs12.c index 2251b60920f8..7aad1b2f1d81 100644 --- a/arch/x86/kvm/vmx/vmcs12.c +++ b/arch/x86/kvm/vmx/vmcs12.c @@ -62,9 +62,11 @@ const unsigned short vmcs12_field_offsets[] = { FIELD64(GUEST_PDPTR2, guest_pdptr2), FIELD64(GUEST_PDPTR3, guest_pdptr3), FIELD64(GUEST_BNDCFGS, guest_bndcfgs), + FIELD64(GUEST_IA32_PKRS, guest_ia32_pkrs), FIELD64(HOST_IA32_PAT, host_ia32_pat), FIELD64(HOST_IA32_EFER, host_ia32_efer), FIELD64(HOST_IA32_PERF_GLOBAL_CTRL, host_ia32_perf_global_ctrl), + FIELD64(HOST_IA32_PKRS, host_ia32_pkrs), FIELD(PIN_BASED_VM_EXEC_CONTROL, pin_based_vm_exec_control), FIELD(CPU_BASED_VM_EXEC_CONTROL, cpu_based_vm_exec_control), FIELD(EXCEPTION_BITMAP, exception_bitmap), diff --git a/arch/x86/kvm/vmx/vmcs12.h b/arch/x86/kvm/vmx/vmcs12.h index 746129ddd5ae..4f41be3c351c 100644 --- a/arch/x86/kvm/vmx/vmcs12.h +++ b/arch/x86/kvm/vmx/vmcs12.h @@ -185,6 +185,8 @@ struct __packed vmcs12 { u16 host_gs_selector; u16 host_tr_selector; u16 guest_pml_index; + u64 host_ia32_pkrs; + u64 guest_ia32_pkrs; }; /* @@ -359,6 +361,8 @@ static inline void vmx_check_vmcs12_offsets(void) CHECK_OFFSET(host_gs_selector, 992); CHECK_OFFSET(host_tr_selector, 994); CHECK_OFFSET(guest_pml_index, 996); + CHECK_OFFSET(host_ia32_pkrs, 998); + CHECK_OFFSET(guest_ia32_pkrs, 1006); } extern const unsigned short vmcs12_field_offsets[]; diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index cbcb0d7b47a4..a62dc65299d5 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -7294,6 +7294,7 @@ static void nested_vmx_cr_fixed1_bits_update(struct kvm_vcpu *vcpu) cr4_fixed1_update(X86_CR4_PKE, ecx, feature_bit(PKU)); cr4_fixed1_update(X86_CR4_UMIP, ecx, feature_bit(UMIP)); cr4_fixed1_update(X86_CR4_LA57, ecx, feature_bit(LA57)); + cr4_fixed1_update(X86_CR4_PKS, ecx, feature_bit(PKS)); #undef cr4_fixed1_update } diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h index 91723a226bf3..82f79ac46d7b 100644 --- a/arch/x86/kvm/vmx/vmx.h +++ b/arch/x86/kvm/vmx/vmx.h @@ -222,6 +222,8 @@ struct nested_vmx { u64 vmcs01_debugctl; u64 vmcs01_guest_bndcfgs; + u64 vmcs01_guest_pkrs; + /* to migrate it to L1 if L2 writes to L1's CR8 directly */ int l1_tpr_threshold;