From patchwork Mon Dec 18 14:05:42 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tao Su X-Patchwork-Id: 13497040 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.11]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AEAED42386 for ; Mon, 18 Dec 2023 14:05:52 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="WrewiUUf" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1702908353; x=1734444353; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=Ccm3LoWl9JBPlFQ2Ojl1RY6OO+r61fopH529GyTWfIA=; b=WrewiUUfWJcjUZPk8uQakXFgpttroIVWSLhLrjfdGJ1IX/E7pa56nnrO 8x4Gl51bZmfXaw0W/GM1gLvojn7usLF0ePKGT7sC8rF4ezj0bbBkHdaiy IUe+JD2OgViKgeImgC+HVudxC1rp4vIKmFo8arel8puiev8ZCWjAeJO8p bD7oSAzVoObh6Ye9fOdasRTRE+Yazv3rk3oh/lXSCArdQqkNcSK+HwYjz +/PXWnl5wDko+HbLGbcGVeVVSWkUdxEH3piAgrXrbtuWNs5dh8DXPOxXp rP08149XF8MOBgdBabHZfbwVQIlCcLwD2I51OChn0FBZwHaB97+UvFMY/ A==; X-IronPort-AV: E=McAfee;i="6600,9927,10927"; a="2346360" X-IronPort-AV: E=Sophos;i="6.04,285,1695711600"; d="scan'208";a="2346360" Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by orvoesa103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Dec 2023 06:05:51 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10927"; a="1106957768" X-IronPort-AV: E=Sophos;i="6.04,285,1695711600"; d="scan'208";a="1106957768" Received: from st-server.bj.intel.com ([10.240.193.102]) by fmsmga005.fm.intel.com with ESMTP; 18 Dec 2023 06:05:47 -0800 From: Tao Su To: kvm@vger.kernel.org Cc: seanjc@google.com, pbonzini@redhat.com, eddie.dong@intel.com, chao.gao@intel.com, xiaoyao.li@intel.com, yuan.yao@linux.intel.com, yi1.lai@intel.com, xudong.hao@intel.com, chao.p.peng@intel.com, tao1.su@linux.intel.com Subject: [PATCH 1/2] x86: KVM: Limit guest physical bits when 5-level EPT is unsupported Date: Mon, 18 Dec 2023 22:05:42 +0800 Message-Id: <20231218140543.870234-2-tao1.su@linux.intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20231218140543.870234-1-tao1.su@linux.intel.com> References: <20231218140543.870234-1-tao1.su@linux.intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 When host doesn't support 5-level EPT, bits 51:48 of the guest physical address must all be zero, otherwise an EPT violation always occurs and current handler can't resolve this if the gpa is in RAM region. Hence, instruction will keep being executed repeatedly, which causes infinite EPT violation. Six KVM selftests are timeout due to this issue: kvm:access_tracking_perf_test kvm:demand_paging_test kvm:dirty_log_test kvm:dirty_log_perf_test kvm:kvm_page_table_test kvm:memslot_modification_stress_test The above selftests add a RAM region close to max_gfn, if host has 52 physical bits but doesn't support 5-level EPT, these will trigger infinite EPT violation when access the RAM region. Since current Intel CPUID doesn't report max guest physical bits like AMD, introduce kvm_mmu_tdp_maxphyaddr() to limit guest physical bits when tdp is enabled and report the max guest physical bits which is smaller than host. When guest physical bits is smaller than host, some GPA are illegal from guest's perspective, but are still legal from hardware's perspective, which should be trapped to inject #PF. Current KVM already has a parameter allow_smaller_maxphyaddr to support the case when guest.MAXPHYADDR < host.MAXPHYADDR, which is disabled by default when EPT is enabled, user can enable it when loading kvm-intel module. When allow_smaller_maxphyaddr is enabled and guest accesses an illegal address from guest's perspective, KVM will utilize EPT violation and emulate the instruction to inject #PF and determine #PF error code. Reported-by: Yi Lai Signed-off-by: Tao Su Tested-by: Yi Lai Tested-by: Xudong Hao --- arch/x86/kvm/cpuid.c | 5 +++-- arch/x86/kvm/mmu.h | 1 + arch/x86/kvm/mmu/mmu.c | 7 +++++++ 3 files changed, 11 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index dda6fc4cfae8..91933ca739ad 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -1212,12 +1212,13 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function) * * If TDP is enabled but an explicit guest MAXPHYADDR is not * provided, use the raw bare metal MAXPHYADDR as reductions to - * the HPAs do not affect GPAs. + * the HPAs do not affect GPAs, but ensure guest MAXPHYADDR + * doesn't exceed the bits that TDP can translate. */ if (!tdp_enabled) g_phys_as = boot_cpu_data.x86_phys_bits; else if (!g_phys_as) - g_phys_as = phys_as; + g_phys_as = min(phys_as, kvm_mmu_tdp_maxphyaddr()); entry->eax = g_phys_as | (virt_as << 8); entry->ecx &= ~(GENMASK(31, 16) | GENMASK(11, 8)); diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h index bb8c86eefac0..1c7d649fcf6b 100644 --- a/arch/x86/kvm/mmu.h +++ b/arch/x86/kvm/mmu.h @@ -115,6 +115,7 @@ int kvm_handle_page_fault(struct kvm_vcpu *vcpu, u64 error_code, u64 fault_address, char *insn, int insn_len); void __kvm_mmu_refresh_passthrough_bits(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu); +unsigned int kvm_mmu_tdp_maxphyaddr(void); int kvm_mmu_load(struct kvm_vcpu *vcpu); void kvm_mmu_unload(struct kvm_vcpu *vcpu); diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index c57e181bba21..72634d6b61b2 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -5177,6 +5177,13 @@ void __kvm_mmu_refresh_passthrough_bits(struct kvm_vcpu *vcpu, reset_guest_paging_metadata(vcpu, mmu); } +/* guest-physical-address bits limited by TDP */ +unsigned int kvm_mmu_tdp_maxphyaddr(void) +{ + return max_tdp_level == 5 ? 57 : 48; +} +EXPORT_SYMBOL_GPL(kvm_mmu_tdp_maxphyaddr); + static inline int kvm_mmu_get_tdp_level(struct kvm_vcpu *vcpu) { /* tdp_root_level is architecture forced level, use it if nonzero */ From patchwork Mon Dec 18 14:05:43 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tao Su X-Patchwork-Id: 13497041 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.11]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A02614988C for ; Mon, 18 Dec 2023 14:05:54 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="g8RTBr1n" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1702908355; x=1734444355; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=3Gy1A3/GFh4TPqV/6cJydyIFDiQu1Xmai2B2mjJZfBQ=; b=g8RTBr1nq02Q2WxOMiISszszO6Qdh6AA9w5bKXnswvIPU/dMIGEnJVjc vMKkBwBQ+L0EEkoPF/H65Y3wlyQJeYbgi+f2JWieoNjnSSEMJmtpWlz0p Th6/uO1ek8qC8MDUcGtGpRdTMrG0npGMwQyb6CjMIq9avpLtjO0xIGZag PIg4SP7kSOyEUYKZopiC01pZqv8v90SaEcI06lsGOrkM/BXcwCWqRHwYK 8N4TJSECTtFcNrSANnWxZQ/Z2TbnssotarfVo4rqMX0oijWJqtyLCUfaB P4bNThuSgWdGHK9T5UJgxqPN+6atw3NW1dTDm3SxpYA0C/p/tuMi0eXVi A==; X-IronPort-AV: E=McAfee;i="6600,9927,10927"; a="2346370" X-IronPort-AV: E=Sophos;i="6.04,285,1695711600"; d="scan'208";a="2346370" Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by orvoesa103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Dec 2023 06:05:54 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10927"; a="1106957785" X-IronPort-AV: E=Sophos;i="6.04,285,1695711600"; d="scan'208";a="1106957785" Received: from st-server.bj.intel.com ([10.240.193.102]) by fmsmga005.fm.intel.com with ESMTP; 18 Dec 2023 06:05:50 -0800 From: Tao Su To: kvm@vger.kernel.org Cc: seanjc@google.com, pbonzini@redhat.com, eddie.dong@intel.com, chao.gao@intel.com, xiaoyao.li@intel.com, yuan.yao@linux.intel.com, yi1.lai@intel.com, xudong.hao@intel.com, chao.p.peng@intel.com, tao1.su@linux.intel.com Subject: [PATCH 2/2] x86: KVM: Emulate instruction when GPA can't be translated by EPT Date: Mon, 18 Dec 2023 22:05:43 +0800 Message-Id: <20231218140543.870234-3-tao1.su@linux.intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20231218140543.870234-1-tao1.su@linux.intel.com> References: <20231218140543.870234-1-tao1.su@linux.intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 With 4-level EPT, bits 51:48 of the guest physical address must all be zero; otherwise, an EPT violation always occurs, which is an unexpected VM exit in KVM currently. Even though KVM advertises the max physical bits to guest, guest may ignore MAXPHYADDR in CPUID and set a bigger physical bits to KVM. Rejecting invalid guest physical bits on KVM side is a choice, but it will break current KVM ABI, e.g., current QEMU ignores the physical bits advertised by KVM and uses host physical bits as guest physical bits by default when using '-cpu host', although we would like to send a patch to QEMU, it will still cause backward compatibility issues. For GPA that can't be translated by EPT but within host.MAXPHYADDR, emulation should be the best choice since KVM will inject #PF for the invalid GPA in guest's perspective and try to emulate the instructions which minimizes the impact on guests as much as possible. Signed-off-by: Tao Su Tested-by: Yi Lai --- arch/x86/kvm/vmx/vmx.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index be20a60047b1..a8aa2cfa2f5d 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -5774,6 +5774,13 @@ static int handle_ept_violation(struct kvm_vcpu *vcpu) vcpu->arch.exit_qualification = exit_qualification; + /* + * Emulate the instruction when accessing a GPA which is set any bits + * beyond guest-physical bits that EPT can translate. + */ + if (unlikely(gpa & rsvd_bits(kvm_mmu_tdp_maxphyaddr(), 63))) + return kvm_emulate_instruction(vcpu, 0); + /* * Check that the GPA doesn't exceed physical memory limits, as that is * a guest page fault. We have to emulate the instruction here, because