From patchwork Mon Feb 24 07:08:32 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yan Zhao X-Patchwork-Id: 13987499 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.13]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 126291A072C; Mon, 24 Feb 2025 07:09:48 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.13 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740380991; cv=none; b=L0mG/hsSv3snpj9MJwtz7VIm1eY5TOQwiCNO9wgNYHAAohd50lVguQVlGj7zr/CiIfRja99kvkQW3R+5hd8unOuF5WNQB8L+R1H2VNV4BM7NaeZ2FkuCCQgsilJ/GNZ1a+D2Mlr2DXt8y/xZCKdEcZoMEJpADik5xIdS5Ka3fYE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740380991; c=relaxed/simple; bh=XUC1UkF9PtibPdWLdRDQzE8kWmuC6oCyYfcuGn1Rg5Y=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Pw6bLKWn+Bfn7MHK7YTukHPEu9+z1O7tLew0eaMU9UKRduSB9YBVQSLodm0/SPONgif5sxqAkWz2eqfCGAiSx3UMR+vMldF9/612TeivV7PmYmqO6DaMrvrl72mKdQfbifJruqP/ekW1qo1YVhxUK61I3DmSjNodnrSa9cQAMFw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=eVli4zsi; arc=none smtp.client-ip=198.175.65.13 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="eVli4zsi" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1740380989; x=1771916989; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=XUC1UkF9PtibPdWLdRDQzE8kWmuC6oCyYfcuGn1Rg5Y=; b=eVli4zsi7SK0nIbTC4pvjM+QCyPSV2nrCChsJLY2rRPQYOEYw4p6OIJ4 ePElTssFg08RR9P921fPV4zRRM+JWkppFWUi6Qo6CH8CZvMtlADkrpF5m /1BZtjLUWmCmI2DroTo9cxK8GUxan2arF4iQRFFiCxzEV+f72gmqqBEWO 4Ol2a5EMwQ/LAkg2+G5TeEFt7iXkgm4+gVNh2ho/qEZgI5AfOAcNx1KaE A+a16H8gtSAYSL1aw2mKOQN62EtjZvNS15Rkslp1YrcSBtvfOhiYn8h11 9EseFZLr5b5QwhYzu3g0sPzh9uzqUhFxMcvLDdcKUcj06xKd71Zv+9Tfj Q==; X-CSE-ConnectionGUID: +YnJobzGQbu1qCg9lMPJTQ== X-CSE-MsgGUID: NhbB/1Q/SFKNDh18jrRLTg== X-IronPort-AV: E=McAfee;i="6700,10204,11354"; a="52117295" X-IronPort-AV: E=Sophos;i="6.13,309,1732608000"; d="scan'208";a="52117295" Received: from orviesa004.jf.intel.com ([10.64.159.144]) by orvoesa105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 Feb 2025 23:09:48 -0800 X-CSE-ConnectionGUID: 2XeGKsmEQWuIi6HZTF31Yg== X-CSE-MsgGUID: qSoedMXWRd6KpW7p2bbz8w== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.13,309,1732608000"; d="scan'208";a="120951840" Received: from yzhao56-desk.sh.intel.com ([10.239.159.62]) by orviesa004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 Feb 2025 23:09:47 -0800 From: Yan Zhao To: pbonzini@redhat.com, seanjc@google.com Cc: rick.p.edgecombe@intel.com, kevin.tian@intel.com, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, Yan Zhao Subject: [PATCH 1/3] KVM: x86: Introduce supported_quirks for platform-specific valid quirks Date: Mon, 24 Feb 2025 15:08:32 +0800 Message-ID: <20250224070832.31394-1-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.43.2 In-Reply-To: <20250224070716.31360-1-yan.y.zhao@intel.com> References: <20250224070716.31360-1-yan.y.zhao@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Introduce supported_quirks in kvm_caps to store platform-specific valid quirks. Rename KVM_X86_VALID_QUIRKS to KVM_X86_VALID_QUIRKS_COMMON, representing valid quirks common to all x86 platforms. Initialize kvm_caps.supported_quirks to KVM_X86_VALID_QUIRKS_COMMON in the common vendor initializer kvm_x86_vendor_init(). Use kvm_caps.supported_quirks to respond to user queries about valid quirks and to mask out unsupported quirks provided by the user. In kvm_check_has_quirk(), in additional to check if a quirk is not explicitly disabled by the user, also verify if the quirk is supported by the platform. This ensures KVM does not treat a quirk as enabled if it's not explicitly disabled by the user but is outside the platform supported mask. This is a preparation for introducing quirks specific to certain platforms, e.g., quirks present only on Intel platforms and not on AMD. No functional changes intended. Signed-off-by: Yan Zhao --- arch/x86/include/asm/kvm_host.h | 2 +- arch/x86/kvm/x86.c | 5 +++-- arch/x86/kvm/x86.h | 12 +++++++----- 3 files changed, 11 insertions(+), 8 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 089cf2c82414..8d15e604613b 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -2409,7 +2409,7 @@ int memslot_rmap_alloc(struct kvm_memory_slot *slot, unsigned long npages); #define KVM_CLOCK_VALID_FLAGS \ (KVM_CLOCK_TSC_STABLE | KVM_CLOCK_REALTIME | KVM_CLOCK_HOST_TSC) -#define KVM_X86_VALID_QUIRKS \ +#define KVM_X86_VALID_QUIRKS_COMMON \ (KVM_X86_QUIRK_LINT0_REENABLED | \ KVM_X86_QUIRK_CD_NW_CLEARED | \ KVM_X86_QUIRK_LAPIC_MMIO_HOLE | \ diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 3078e09fc841..4f1b73620c6a 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -4782,7 +4782,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) r = enable_pmu ? KVM_CAP_PMU_VALID_MASK : 0; break; case KVM_CAP_DISABLE_QUIRKS2: - r = KVM_X86_VALID_QUIRKS; + r = kvm_caps.supported_quirks; break; case KVM_CAP_X86_NOTIFY_VMEXIT: r = kvm_caps.has_notify_vmexit; @@ -6521,7 +6521,7 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm, switch (cap->cap) { case KVM_CAP_DISABLE_QUIRKS2: r = -EINVAL; - if (cap->args[0] & ~KVM_X86_VALID_QUIRKS) + if (cap->args[0] & ~kvm_caps.supported_quirks) break; fallthrough; case KVM_CAP_DISABLE_QUIRKS: @@ -9775,6 +9775,7 @@ int kvm_x86_vendor_init(struct kvm_x86_init_ops *ops) kvm_host.xcr0 = xgetbv(XCR_XFEATURE_ENABLED_MASK); kvm_caps.supported_xcr0 = kvm_host.xcr0 & KVM_SUPPORTED_XCR0; } + kvm_caps.supported_quirks = KVM_X86_VALID_QUIRKS_COMMON; rdmsrl_safe(MSR_EFER, &kvm_host.efer); diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h index 8ce6da98b5a2..772d5c320be1 100644 --- a/arch/x86/kvm/x86.h +++ b/arch/x86/kvm/x86.h @@ -34,6 +34,7 @@ struct kvm_caps { u64 supported_xcr0; u64 supported_xss; u64 supported_perf_cap; + u64 supported_quirks; }; struct kvm_host_values { @@ -354,11 +355,6 @@ static inline void kvm_register_write(struct kvm_vcpu *vcpu, return kvm_register_write_raw(vcpu, reg, val); } -static inline bool kvm_check_has_quirk(struct kvm *kvm, u64 quirk) -{ - return !(kvm->arch.disabled_quirks & quirk); -} - void kvm_inject_realmode_interrupt(struct kvm_vcpu *vcpu, int irq, int inc_eip); u64 get_kvmclock_ns(struct kvm *kvm); @@ -394,6 +390,12 @@ extern struct kvm_host_values kvm_host; extern bool enable_pmu; +static inline bool kvm_check_has_quirk(struct kvm *kvm, u64 quirk) +{ + return (kvm_caps.supported_quirks & quirk) && + !(kvm->arch.disabled_quirks & quirk); +} + /* * Get a filtered version of KVM's supported XCR0 that strips out dynamic * features for which the current process doesn't (yet) have permission to use. From patchwork Mon Feb 24 07:09:45 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yan Zhao X-Patchwork-Id: 13987500 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.20]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 365AF1A072C; Mon, 24 Feb 2025 07:11:06 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.20 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740381072; cv=none; b=B8boLGsssdQC/1/QhrAnR+27ueBwCdfS0zIBIjJ8m0xvt+vXdpVyhIRuYEezj0BMVL6TRm1ZaRV1x08HV1oWi2f8zH2rUDophqpSNuy4A3fWmJE4ytuXpXh2XYqydTLZe89nX3KtKSEn0M8tN97waKwSpFJvHYaJv0wTxGYFYbQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740381072; c=relaxed/simple; bh=FWFFpNZsKlibN+WX6V9zp3fTUq0OUIIXOt61vIr0x2c=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=iTV0wedTm2g2vQSZEqdCKebiMSc6I6Hhf/bVmICpAIDacYUq8E0MFWi/hQo8KL008at60NCncXWjFuyQtRI/PCIHdL/C7iXDCECx2RlMM8Jg8DhfnfTD0bZo2QQfuqByCmHHTJDoYhaPrU1/ccdAALQIkBiBJ6W/S2IL/vrZTJU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=P30AgQgV; arc=none smtp.client-ip=198.175.65.20 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="P30AgQgV" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1740381068; x=1771917068; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=FWFFpNZsKlibN+WX6V9zp3fTUq0OUIIXOt61vIr0x2c=; b=P30AgQgV2EOSdjP+6IZ82MrBArkmX6S4wv8GOMJ7VsqgwRfLqEhO85Y5 WDujmoVLyejqhuMnWI4aM/0adIRoUeyqhLKB6peWAQDJH64CFjwNGER8J PlGNLGqvFhwDPpOd8nWAR0gkZpV6u8ayAiAKgGqB1tBZ4l5BKep/oFyVi NMa1nvRKqj2ctxJrxRbniR0ztDcxCxkxF/MSj+3Nm/8ibyrVuL/eE/mHv /AMyOKnz0U1Q8aWXMrpJKX5MWxnSOtsRqtC6AN4maIa9+36ojbL/Jg72S g1lqpAOxjJqXbiNumUrtzoCIMmGUS9XJNJqgHJBTX2fcz+hcX2olctsR6 Q==; X-CSE-ConnectionGUID: o6KafWZNRTOgOykxCHca6w== X-CSE-MsgGUID: L9/JtzWtTEyh5/9M33NF+g== X-IronPort-AV: E=McAfee;i="6700,10204,11354"; a="40831674" X-IronPort-AV: E=Sophos;i="6.13,309,1732608000"; d="scan'208";a="40831674" Received: from orviesa007.jf.intel.com ([10.64.159.147]) by orvoesa112.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 Feb 2025 23:11:01 -0800 X-CSE-ConnectionGUID: UvIP/3KjRw2mb4tUISjo3w== X-CSE-MsgGUID: HDoRFvktRb+7FqwsLyoixQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.12,224,1728975600"; d="scan'208";a="116465386" Received: from yzhao56-desk.sh.intel.com ([10.239.159.62]) by orviesa007-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 Feb 2025 23:10:59 -0800 From: Yan Zhao To: pbonzini@redhat.com, seanjc@google.com Cc: rick.p.edgecombe@intel.com, kevin.tian@intel.com, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, Yan Zhao Subject: [PATCH 2/3] KVM: x86: Introduce Intel specific quirk KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT Date: Mon, 24 Feb 2025 15:09:45 +0800 Message-ID: <20250224070946.31482-1-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.43.2 In-Reply-To: <20250224070716.31360-1-yan.y.zhao@intel.com> References: <20250224070716.31360-1-yan.y.zhao@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Introduce a Intel specific quirk KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT to have KVM ignore guest PAT when this quirk is enabled. KVM is able to safely honor guest PAT on Intel platforms when CPU feature self-snoop is supported. However, KVM honoring guest PAT was reverted after commit 9d70f3fec144 ("Revert "KVM: VMX: Always honor guest PAT on CPUs that support self-snoop""), due to UC access on certain Intel platforms being very slow [1]. Honoring guest PAT on those platforms may break some old guests that accidentally specify PAT as UC. Those old guests may never expect the slowness since KVM always forces WB previously. See [2]. So, introduce an Intel specific quirk KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT. KVM enables the quirk on all Intel platforms by default to avoid breaking old unmodifiable guests. Newer userspace can disable this quirk to turn on honoring guest PAT. The quirk is only valid on Intel's platforms and is absent on AMD's platforms as KVM always honors guest PAT when running on AMD. Suggested-by: Paolo Bonzini Suggested-by: Sean Christopherson Cc: Kevin Tian Signed-off-by: Yan Zhao Link: https://lore.kernel.org/all/Ztl9NWCOupNfVaCA@yzhao56-desk.sh.intel.com # [1] Link: https://lore.kernel.org/all/87jzfutmfc.fsf@redhat.com # [2] --- Documentation/virt/kvm/api.rst | 28 +++++++++++++++++++++++ arch/x86/include/uapi/asm/kvm.h | 1 + arch/x86/kvm/mmu.h | 2 +- arch/x86/kvm/mmu/mmu.c | 14 +++++++----- arch/x86/kvm/vmx/vmx.c | 39 +++++++++++++++++++++++++++------ arch/x86/kvm/x86.c | 2 +- 6 files changed, 72 insertions(+), 14 deletions(-) diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index d5363d88fa52..c22211c2f54c 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -8164,6 +8164,34 @@ KVM_X86_QUIRK_STUFF_FEATURE_MSRS By default, at vCPU creation, KVM sets the and 0x489), as KVM does now allow them to be set by userspace (KVM sets them based on guest CPUID, for safety purposes). + +KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT By default, on Intel platforms, KVM ignores + guest PAT and forces the effective memory + type to WB in EPT. The quirk has no effect + when KVM runs on Intel platforms which are + incapable of safely honoring guest PAT + (i.e., without CPU feature self-snoop, KVM + always ignores guest PAT and forces + effective memory type to WB) or when a VM + has assigned non-coherent DMA devices (KVM + always honors guest PAT with assigned + non-coherent DMA devices). On certain Intel + Xeon platforms (e.g. ICX, SPR), though + self-snoop feature is supported, UC is slow + enough to cause issues with some older + guests (e.g. an old version of bochs driver + uses ioremap() instead of ioremap_wc() to + map the video RAM, causing wayland desktop + to fail to start correctly). To prevent + breaking older guest software, KVM enables + the quirk by default on Intel platforms. + Userspace can disable the quirk to honor + guest PAT when there is no older + unmodifiable guest software that relies on + KVM to force memory type to WB. Note, the + quirk is not visible on AMD's platforms, + i.e., KVM always honors guest PAT when + running on AMD. =================================== ============================================ 7.32 KVM_CAP_MAX_VCPU_ID diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h index 89cc7a18ef45..db55a70e173c 100644 --- a/arch/x86/include/uapi/asm/kvm.h +++ b/arch/x86/include/uapi/asm/kvm.h @@ -441,6 +441,7 @@ struct kvm_sync_regs { #define KVM_X86_QUIRK_MWAIT_NEVER_UD_FAULTS (1 << 6) #define KVM_X86_QUIRK_SLOT_ZAP_ALL (1 << 7) #define KVM_X86_QUIRK_STUFF_FEATURE_MSRS (1 << 8) +#define KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT (1 << 9) #define KVM_STATE_NESTED_FORMAT_VMX 0 #define KVM_STATE_NESTED_FORMAT_SVM 1 diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h index 47e64a3c4ce3..f999c15d8d3e 100644 --- a/arch/x86/kvm/mmu.h +++ b/arch/x86/kvm/mmu.h @@ -232,7 +232,7 @@ static inline u8 permission_fault(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu, return -(u32)fault & errcode; } -bool kvm_mmu_may_ignore_guest_pat(void); +bool kvm_mmu_may_ignore_guest_pat(struct kvm *kvm); int kvm_mmu_post_init_vm(struct kvm *kvm); void kvm_mmu_pre_destroy_vm(struct kvm *kvm); diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index e6eb3a262f8d..28d0b73bf685 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -4663,17 +4663,21 @@ static int kvm_tdp_mmu_page_fault(struct kvm_vcpu *vcpu, } #endif -bool kvm_mmu_may_ignore_guest_pat(void) +bool kvm_mmu_may_ignore_guest_pat(struct kvm *kvm) { /* * When EPT is enabled (shadow_memtype_mask is non-zero), and the VM * has non-coherent DMA (DMA doesn't snoop CPU caches), KVM's ABI is to * honor the memtype from the guest's PAT so that guest accesses to * memory that is DMA'd aren't cached against the guest's wishes. As a - * result, KVM _may_ ignore guest PAT, whereas without non-coherent DMA, - * KVM _always_ ignores guest PAT (when EPT is enabled). - */ - return shadow_memtype_mask; + * result, KVM _may_ ignore guest PAT, whereas without non-coherent DMA. + * KVM _always_ ignores guest PAT, when EPT is enabled and when quirk + * KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT is enabled or the CPU lacks the + * ability to safely honor guest PAT. + */ + return shadow_memtype_mask && + (kvm_check_has_quirk(kvm, KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT) || + !static_cpu_has(X86_FEATURE_SELFSNOOP)); } int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 486fbdb4365c..9fb884175bfd 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -7599,6 +7599,34 @@ int vmx_vm_init(struct kvm *kvm) return 0; } +/* + * Ignore guest PAT when the CPU doesn't support self-snoop to safely honor + * guest PAT, or quirk KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT is turned on. Always + * honor guest PAT when there's non-coherent DMA device attached. + * + * Honoring guest PAT means letting the guest control memory types. + * - On Intel CPUs that lack self-snoop feature, honoring guest PAT may result + * in unexpected behavior. So always ignore guest PAT on those CPUs. + * + * - KVM's ABI is to trust the guest for attached non-coherent DMA devices to + * function correctly (non-coherent DMA devices need the guest to flush CPU + * caches properly). So honoring guest PAT to avoid breaking existing ABI. + * + * - On certain Intel CPUs (e.g. SPR, ICX), though self-snoop feature is + * supported, UC is slow enough to cause issues with some older guests (e.g. + * an old version of bochs driver uses ioremap() instead of ioremap_wc() to + * map the video RAM, causing wayland desktop to fail to get started + * correctly). To avoid breaking those old guests that rely on KVM to force + * memory type to WB, only honoring guest PAT when quirk + * KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT is disabled. + */ +static inline bool vmx_ignore_guest_pat(struct kvm *kvm) +{ + return !kvm_arch_has_noncoherent_dma(kvm) && + (kvm_check_has_quirk(kvm, KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT) || + !static_cpu_has(X86_FEATURE_SELFSNOOP)); +} + u8 vmx_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio) { /* @@ -7608,13 +7636,8 @@ u8 vmx_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio) if (is_mmio) return MTRR_TYPE_UNCACHABLE << VMX_EPT_MT_EPTE_SHIFT; - /* - * Force WB and ignore guest PAT if the VM does NOT have a non-coherent - * device attached. Letting the guest control memory types on Intel - * CPUs may result in unexpected behavior, and so KVM's ABI is to trust - * the guest to behave only as a last resort. - */ - if (!kvm_arch_has_noncoherent_dma(vcpu->kvm)) + /* Force WB if ignoring guest PAT */ + if (vmx_ignore_guest_pat(vcpu->kvm)) return (MTRR_TYPE_WRBACK << VMX_EPT_MT_EPTE_SHIFT) | VMX_EPT_IPAT_BIT; return (MTRR_TYPE_WRBACK << VMX_EPT_MT_EPTE_SHIFT); @@ -8498,6 +8521,8 @@ __init int vmx_hardware_setup(void) return r; } + kvm_caps.supported_quirks |= KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT; + vmx_set_cpu_caps(); r = alloc_kvm_area(); diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 4f1b73620c6a..8ae96449e6e2 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -13550,7 +13550,7 @@ static void kvm_noncoherent_dma_assignment_start_or_stop(struct kvm *kvm) * (or last) non-coherent device is (un)registered to so that new SPTEs * with the correct "ignore guest PAT" setting are created. */ - if (kvm_mmu_may_ignore_guest_pat()) + if (kvm_mmu_may_ignore_guest_pat(kvm)) kvm_zap_gfn_range(kvm, gpa_to_gfn(0), gpa_to_gfn(~0ULL)); } From patchwork Mon Feb 24 07:10:39 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yan Zhao X-Patchwork-Id: 13987501 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.21]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5E51918E377; Mon, 24 Feb 2025 07:11:54 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.21 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740381116; cv=none; b=RjIF9gV2u5iDfG2OcCF3TquoluWqcXbJuTuptPP9ZBCpXR2XCUD92XMiUOqyJHKQbdTqDGMxrzpx57qUs/68t+h7nXtqTSVEc8O4IzMWQmOisbWT0v4lauMb8g28dCw47nKt3cXzNIA0wu8IzFpvPsYbtNNssQPvpxhDxVuLatw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740381116; c=relaxed/simple; bh=cJ4uvByNNo5ilZ4fOxOhJYmTbMR72vGx4TCixPoYtgo=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=buBnKdsYPqakGp/HqDmAZZD5aThH65Bp2EoQy1yBVvc0Z6DsYS6TrwH8PN6jRrleErLoNN6qBQ9cYT1vUERBx1U7Xp/oOWepwzgfT+XC+OSEhsN3t5vlmRI3QOm6bhX/vRfAz+Xp4tj5S2WIdXlZDXHQXEkCCjMFeJnAVR9/vuM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=NRaOfJUY; arc=none smtp.client-ip=198.175.65.21 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="NRaOfJUY" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1740381115; x=1771917115; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=cJ4uvByNNo5ilZ4fOxOhJYmTbMR72vGx4TCixPoYtgo=; b=NRaOfJUY52T/MJBL6WbwiiIMYQby+bl3FigvyYnDWsJK9yc2hLajg3/D zET/mIkl0NjyqjQHN5ceFVb6kxS2Estv4Hd+N5r1mAWJkORjnNsOH8nST Vinh4v2nqqJ1VMV0KAsA9Xf+4Oq+K/f8RgxBWVRqi1NSKuRbTTZFIVXco Yr+Jjxw0CtdiB9n0OksSReJJxC0KkAizzt/cp8KH0Mco6bwKZU44ECHS5 joEo4lTVXRfqf8dDHnZ7uKwea8+muUABcOJEUFEiHEZIPOadyZ7sb2k+9 WywHR/QoAqIXY/uCigCH75LhZeXAJ1rLdBv8JSWJWHjj+f19dUiMo2s61 A==; X-CSE-ConnectionGUID: BvCbHr+BTZqVg6uWli7D8w== X-CSE-MsgGUID: wheXqi9EQzuURENlRpg78w== X-IronPort-AV: E=McAfee;i="6700,10204,11354"; a="41035934" X-IronPort-AV: E=Sophos;i="6.13,309,1732608000"; d="scan'208";a="41035934" Received: from fmviesa004.fm.intel.com ([10.60.135.144]) by orvoesa113.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 Feb 2025 23:11:54 -0800 X-CSE-ConnectionGUID: XSI66b0JQteYuCL4gKjAtA== X-CSE-MsgGUID: 39oumnEfRXuoXLvPc1IICg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.13,309,1732608000"; d="scan'208";a="121067677" Received: from yzhao56-desk.sh.intel.com ([10.239.159.62]) by fmviesa004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 Feb 2025 23:11:52 -0800 From: Yan Zhao To: pbonzini@redhat.com, seanjc@google.com Cc: rick.p.edgecombe@intel.com, kevin.tian@intel.com, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, Yan Zhao Subject: [PATCH 3/3] KVM: TDX: Always honor guest PAT on TDX enabled platforms Date: Mon, 24 Feb 2025 15:10:39 +0800 Message-ID: <20250224071039.31511-1-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.43.2 In-Reply-To: <20250224070716.31360-1-yan.y.zhao@intel.com> References: <20250224070716.31360-1-yan.y.zhao@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Always honor guest PAT in KVM-managed EPTs on TDX enabled platforms by making self-snoop feature a hard dependency for TDX and making quirk KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT not a valid quirk once TDX is enabled. The quirk KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT only affects memory type of KVM-managed EPTs. For the TDX-module-managed private EPT, memory type is always forced to WB now. Honoring guest PAT in KVM-managed EPTs ensures KVM does not invoke kvm_zap_gfn_range() when attaching/detaching non-coherent DMA devices, which would cause mirrored EPTs for TDs to be zapped, leading to the TDX-module-managed private EPT being incorrectly zapped. As a new platform, TDX is always with self-snoop feature supported and has no worry to break old not-well-written yet unmodifiable guests. So, simply make the quirk KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT invalid on TDX enabled platforms. Suggested-by: Sean Christopherson Signed-off-by: Yan Zhao --- Documentation/virt/kvm/api.rst | 20 +++++++++++--------- arch/x86/kvm/vmx/main.c | 1 + arch/x86/kvm/vmx/tdx.c | 5 +++++ 3 files changed, 17 insertions(+), 9 deletions(-) diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index c22211c2f54c..5954c5cde33d 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -8165,9 +8165,11 @@ KVM_X86_QUIRK_STUFF_FEATURE_MSRS By default, at vCPU creation, KVM sets the be set by userspace (KVM sets them based on guest CPUID, for safety purposes). -KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT By default, on Intel platforms, KVM ignores - guest PAT and forces the effective memory - type to WB in EPT. The quirk has no effect +KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT By default, on Intel platforms except TDX, + KVM ignores guest PAT and forces the + effective memory type to WB in EPT. The + quirk only affects the memory type of + KVM-managed EPTs. The quirk has no effect when KVM runs on Intel platforms which are incapable of safely honoring guest PAT (i.e., without CPU feature self-snoop, KVM @@ -8184,14 +8186,14 @@ KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT By default, on Intel platforms, KVM ignores map the video RAM, causing wayland desktop to fail to start correctly). To prevent breaking older guest software, KVM enables - the quirk by default on Intel platforms. - Userspace can disable the quirk to honor - guest PAT when there is no older + the quirk by default on Intel platforms + except TDX. Userspace can disable the quirk + to honor guest PAT when there is no older unmodifiable guest software that relies on KVM to force memory type to WB. Note, the - quirk is not visible on AMD's platforms, - i.e., KVM always honors guest PAT when - running on AMD. + quirk is not visible on Intel TDX or AMD's + platforms, i.e., KVM always honors guest PAT + when running on Intel TDX or AMD. =================================== ============================================ 7.32 KVM_CAP_MAX_VCPU_ID diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c index f586e09b5acf..1fa0364faa60 100644 --- a/arch/x86/kvm/vmx/main.c +++ b/arch/x86/kvm/vmx/main.c @@ -1092,6 +1092,7 @@ static int __init vt_init(void) vcpu_align = max_t(unsigned, vcpu_align, __alignof__(struct vcpu_tdx)); kvm_caps.supported_vm_types |= BIT(KVM_X86_TDX_VM); + kvm_caps.supported_quirks &= ~KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT; } /* diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index e73c9fcf213c..7d063cacc9c9 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -3483,6 +3483,11 @@ int __init tdx_bringup(void) goto success_disable_tdx; } + if (!cpu_feature_enabled(X86_FEATURE_SELFSNOOP)) { + pr_err("Self-snoop is reqiured for TDX\n"); + goto success_disable_tdx; + } + if (!kvm_can_support_tdx()) { pr_err("tdx: no TDX private KeyIDs available\n"); goto success_disable_tdx;