From patchwork Sat Mar 1 07:34:25 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Bonzini X-Patchwork-Id: 13997390 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9221D1C3316 for ; Sat, 1 Mar 2025 07:34:36 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740814478; cv=none; b=uxMWtfOHkcfwCOu5+4uDfOjqzf2MRVgQkDjZcNJBP24W2GgixLER+AdY8anyw+auZRF4I6UEAgK9h20UiiGux4hJyjWF/ts/fmgRBT4H1bgBTDjxzIQoUXpHmvg8wkamtrXtaEDFpF/mkCQJYeIObDfB4DY9yX1j9EoTjiBKtcI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740814478; c=relaxed/simple; bh=DtG47iRANC5wZLgg0LnqyN8qTPJehZD/bILJ9tpVnw0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=c37Szh7gN2fGhLR/LrYPQsMi8B8aFXGCH0shJYNjJeRSjaEQgJ0xsEGPdOdHRjsf42EEARgAQ9SBDOgs+7OpDp6VeOatPZZcVgeTwSRYicIXQbFV8echdSIwJX83Ip7xahkeORd/0BnjnI8MMWvV2E8Zde97w4BN4lGC5oeaJyA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=gcjQdnZQ; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="gcjQdnZQ" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1740814475; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=p2tPBmgIhCufUhRamCTf9UFIDJg21WLE0DY219BoGCQ=; b=gcjQdnZQGzGLWrY+a2VhrMxrJHPewpn2RDKNpxH4rWsxn3lrEn2GygUSq/F/QQPMnRkbhi zr2I2pShNF31+jQhOGaXKGRGCIXCP7i1jFaRnn2kdF/lF4eHrJ/tXh8PygxrD9Boo+lQia 7Rf9dyMBleCvtP8A+lCqhDujdI8tpaU= Received: from mx-prod-mc-04.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-665-e3AQlyKZPDio4mgbSPJI_w-1; Sat, 01 Mar 2025 02:34:32 -0500 X-MC-Unique: e3AQlyKZPDio4mgbSPJI_w-1 X-Mimecast-MFC-AGG-ID: e3AQlyKZPDio4mgbSPJI_w_1740814471 Received: from mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.15]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-04.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 5BEB219373D7; Sat, 1 Mar 2025 07:34:31 +0000 (UTC) Received: from virtlab1023.lab.eng.rdu2.redhat.com (virtlab1023.lab.eng.rdu2.redhat.com [10.8.1.187]) by mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 97A5A19560AE; Sat, 1 Mar 2025 07:34:30 +0000 (UTC) From: Paolo Bonzini To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: seanjc@google.com, yan.y.zhao@intel.com Subject: [PATCH 1/4] KVM: x86: Allow vendor code to disable quirks Date: Sat, 1 Mar 2025 02:34:25 -0500 Message-ID: <20250301073428.2435768-2-pbonzini@redhat.com> In-Reply-To: <20250301073428.2435768-1-pbonzini@redhat.com> References: <20250301073428.2435768-1-pbonzini@redhat.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.0 on 10.30.177.15 In some cases, the handling of quirks is split between platform-specific code and generic code, or it is done entirely in generic code, but the relevant bug does not trigger on some platforms; for example, KVM_X86_QUIRK_CD_NW_CLEARED is only applicable to AMD systems. In that case, allow unaffected vendor modules to disable handling of the quirk. The quirk remains available in KVM_CAP_DISABLE_QUIRKS2, because that API tells userspace that KVM *knows* that some of its past behavior was bogus or just undesirable. In other words, it's plausible for userspace to refuse to run if a quirk is not listed by KVM_CAP_DISABLE_QUIRKS2. In kvm_check_has_quirk(), in addition to checking if a quirk is not explicitly disabled by the user, also verify if the quirk applies to the hardware. Signed-off-by: Yan Zhao Message-ID: <20250224070832.31394-1-yan.y.zhao@intel.com> Signed-off-by: Paolo Bonzini --- arch/x86/kvm/vmx/vmx.c | 1 + arch/x86/kvm/x86.c | 1 + arch/x86/kvm/x86.h | 12 +++++++----- 3 files changed, 9 insertions(+), 5 deletions(-) diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 486fbdb4365c..75df4caea2f7 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -8506,6 +8506,7 @@ __init int vmx_hardware_setup(void) kvm_set_posted_intr_wakeup_handler(pi_wakeup_handler); + kvm_caps.inapplicable_quirks = KVM_X86_QUIRK_CD_NW_CLEARED; return r; } diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 856ceeb4fb35..fd0a44e59314 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -9775,6 +9775,7 @@ int kvm_x86_vendor_init(struct kvm_x86_init_ops *ops) kvm_host.xcr0 = xgetbv(XCR_XFEATURE_ENABLED_MASK); kvm_caps.supported_xcr0 = kvm_host.xcr0 & KVM_SUPPORTED_XCR0; } + kvm_caps.inapplicable_quirks = 0; rdmsrl_safe(MSR_EFER, &kvm_host.efer); diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h index 8ce6da98b5a2..9af199c8e5c8 100644 --- a/arch/x86/kvm/x86.h +++ b/arch/x86/kvm/x86.h @@ -34,6 +34,7 @@ struct kvm_caps { u64 supported_xcr0; u64 supported_xss; u64 supported_perf_cap; + u64 inapplicable_quirks; }; struct kvm_host_values { @@ -354,11 +355,6 @@ static inline void kvm_register_write(struct kvm_vcpu *vcpu, return kvm_register_write_raw(vcpu, reg, val); } -static inline bool kvm_check_has_quirk(struct kvm *kvm, u64 quirk) -{ - return !(kvm->arch.disabled_quirks & quirk); -} - void kvm_inject_realmode_interrupt(struct kvm_vcpu *vcpu, int irq, int inc_eip); u64 get_kvmclock_ns(struct kvm *kvm); @@ -394,6 +390,12 @@ extern struct kvm_host_values kvm_host; extern bool enable_pmu; +static inline bool kvm_check_has_quirk(struct kvm *kvm, u64 quirk) +{ + u64 disabled_quirks = kvm_caps.inapplicable_quirks | kvm->arch.disabled_quirks; + return !(disabled_quirks & quirk); +} + /* * Get a filtered version of KVM's supported XCR0 that strips out dynamic * features for which the current process doesn't (yet) have permission to use. From patchwork Sat Mar 1 07:34:26 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Bonzini X-Patchwork-Id: 13997393 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E011B1D5CC7 for ; Sat, 1 Mar 2025 07:34:41 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740814483; cv=none; b=c2/krOHhwYn43k7i3ZfcXBbwvdG4+X78oFUzdnzKgKnEUULoLnN+5XAvHfJI/hMuTI5I4puRaGDfW/6YABCT93NCquCXUdaeYJBsv62vfm07pg2tsz0gpxNLtoZPNszuhzZwgC9C+irWpMwZgnctzOCNTsMok5HAa66iSFGWhLo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740814483; c=relaxed/simple; bh=ATIprvoSK3n7ncfoU2aOka1PCDoa4nzg9S9uQdg1nAg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=iqyJyNHuCtd40OCyDw0De+Z4Nbv2fbzyic3bvudkXPoEMoNsiCRKlDddztIeZBUN2eL6HzeU2tUuB4ltlCxD9XS5W+WZLpQVZNZQjwUDA5gngUGX7UUJITPUgGzqpFW2yItd7UmZAW9JDvFS6oc2lzvD9GMyfZhgEFoW1HZYAqU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=S5WfRqTp; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="S5WfRqTp" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1740814481; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=XhWqgRM8gKFGfcQVIoLrXG00OiHCfInbynMqLoPxhNw=; b=S5WfRqTp6nHifVdiEwN/mm55aMARxbRidwdn/T8p4svecoDvCSh2KqRyFVBQtco3XUN7fx ENDOWM35eQCf1OAsdB+N8DaqhN6Ug4PpDRDAADdf2bJ6fU7sn3sgH17UO4afUxDvko7wqy 05uCs+5RtDa7B5SLJatSoy1Sx1HFwHo= Received: from mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-607-RZfpdkRUP76OLIdz5etWcg-1; Sat, 01 Mar 2025 02:34:34 -0500 X-MC-Unique: RZfpdkRUP76OLIdz5etWcg-1 X-Mimecast-MFC-AGG-ID: RZfpdkRUP76OLIdz5etWcg_1740814472 Received: from mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.15]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 9C7721800871; Sat, 1 Mar 2025 07:34:32 +0000 (UTC) Received: from virtlab1023.lab.eng.rdu2.redhat.com (virtlab1023.lab.eng.rdu2.redhat.com [10.8.1.187]) by mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 91A5619560AE; Sat, 1 Mar 2025 07:34:31 +0000 (UTC) From: Paolo Bonzini To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: seanjc@google.com, yan.y.zhao@intel.com Subject: [PATCH 2/4] KVM: x86: Introduce supported_quirks to block disabling quirks Date: Sat, 1 Mar 2025 02:34:26 -0500 Message-ID: <20250301073428.2435768-3-pbonzini@redhat.com> In-Reply-To: <20250301073428.2435768-1-pbonzini@redhat.com> References: <20250301073428.2435768-1-pbonzini@redhat.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.0 on 10.30.177.15 From: Yan Zhao Introduce supported_quirks in kvm_caps to store platform-specific force-enabled quirks. Any quirk removed from kvm_caps.supported_quirks will never be included in kvm->arch.disabled_quirks, and will cause the ioctl to fail if passed to KVM_ENABLE_CAP(KVM_CAP_DISABLE_QUIRKS2). Signed-off-by: Yan Zhao Message-ID: <20250224070832.31394-1-yan.y.zhao@intel.com> Signed-off-by: Paolo Bonzini Reviewed-by: Xiaoyao Li --- arch/x86/kvm/x86.c | 7 ++++--- arch/x86/kvm/x86.h | 2 ++ 2 files changed, 6 insertions(+), 3 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index fd0a44e59314..a97e58916b6a 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -4782,7 +4782,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) r = enable_pmu ? KVM_CAP_PMU_VALID_MASK : 0; break; case KVM_CAP_DISABLE_QUIRKS2: - r = KVM_X86_VALID_QUIRKS; + r = kvm_caps.supported_quirks; break; case KVM_CAP_X86_NOTIFY_VMEXIT: r = kvm_caps.has_notify_vmexit; @@ -6521,11 +6521,11 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm, switch (cap->cap) { case KVM_CAP_DISABLE_QUIRKS2: r = -EINVAL; - if (cap->args[0] & ~KVM_X86_VALID_QUIRKS) + if (cap->args[0] & ~kvm_caps.supported_quirks) break; fallthrough; case KVM_CAP_DISABLE_QUIRKS: - kvm->arch.disabled_quirks = cap->args[0]; + kvm->arch.disabled_quirks = cap->args[0] & kvm_caps.supported_quirks; r = 0; break; case KVM_CAP_SPLIT_IRQCHIP: { @@ -9775,6 +9775,7 @@ int kvm_x86_vendor_init(struct kvm_x86_init_ops *ops) kvm_host.xcr0 = xgetbv(XCR_XFEATURE_ENABLED_MASK); kvm_caps.supported_xcr0 = kvm_host.xcr0 & KVM_SUPPORTED_XCR0; } + kvm_caps.supported_quirks = KVM_X86_VALID_QUIRKS; kvm_caps.inapplicable_quirks = 0; rdmsrl_safe(MSR_EFER, &kvm_host.efer); diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h index 9af199c8e5c8..f2672b14388c 100644 --- a/arch/x86/kvm/x86.h +++ b/arch/x86/kvm/x86.h @@ -34,6 +34,8 @@ struct kvm_caps { u64 supported_xcr0; u64 supported_xss; u64 supported_perf_cap; + + u64 supported_quirks; u64 inapplicable_quirks; }; From patchwork Sat Mar 1 07:34:27 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Bonzini X-Patchwork-Id: 13997394 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id F07F51CD1E0 for ; Sat, 1 Mar 2025 07:34:53 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740814495; cv=none; b=bifyHfBY949XDBEZNt6R0Fqa+1cYiNl2zl5oRnkMg7i09twMS0ve4lzMtDcB7zgcmj8FjjvTXQL01XJbgXiN1rgvRWFkydUYIJqkQDG0CrNH5toJ1D9TPmnPABsg7wWyMUTEINI0o2wAP3EDemmQ7JzJ3CcbYadrXehh6Q63HZU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740814495; c=relaxed/simple; bh=d9isiOmwgsUb0CjVDs6grBxaGH/sjYwHlsMQDcTAOm4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=GYkwiimX1Ncz5ShHT9JRy5Djgz6MH+z7c9YGNwsiqPbQX9uMuynpjhx2b+NrNiVagdQo+pPOUf7mZEOiGnjsZwKWgZdmyPtsZ6qjDHY/Caf7pVixnyBTaJs6U1ikYTX2PTirKgZTMluJEUeQIwnEuZ2T71IXsgpSL5Eyi94rfvU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=VsPU3nd6; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="VsPU3nd6" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1740814493; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=3jTxtJKmZDa4/XsXS9SpdkYsIOEr7/j29bQcYHyXCt0=; b=VsPU3nd6g9i94brxGCwedoekZlN6wwgZsQiMa8PE8r5PLAIftJgQMhyuWLdRv7CN43YC58 gbSlIDtFVIk3PdQKhojADQTUFrdhJ7x90ne2cwMFD79emJJr49Q705rvoJovEy4LsFvJDt CnnJPcBOeqBPwzlws5//hO37RVcwXzA= Received: from mx-prod-mc-02.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-212-tIdT7sU1MiO7kQrPKbVKbA-1; Sat, 01 Mar 2025 02:34:34 -0500 X-MC-Unique: tIdT7sU1MiO7kQrPKbVKbA-1 X-Mimecast-MFC-AGG-ID: tIdT7sU1MiO7kQrPKbVKbA_1740814473 Received: from mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.15]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-02.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id B013D1954B23; Sat, 1 Mar 2025 07:34:33 +0000 (UTC) Received: from virtlab1023.lab.eng.rdu2.redhat.com (virtlab1023.lab.eng.rdu2.redhat.com [10.8.1.187]) by mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id CF15C19560B9; Sat, 1 Mar 2025 07:34:32 +0000 (UTC) From: Paolo Bonzini To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: seanjc@google.com, yan.y.zhao@intel.com, Kevin Tian Subject: [PATCH 3/4] KVM: x86: Introduce Intel specific quirk KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT Date: Sat, 1 Mar 2025 02:34:27 -0500 Message-ID: <20250301073428.2435768-4-pbonzini@redhat.com> In-Reply-To: <20250301073428.2435768-1-pbonzini@redhat.com> References: <20250301073428.2435768-1-pbonzini@redhat.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.0 on 10.30.177.15 From: Yan Zhao Introduce an Intel specific quirk KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT to have KVM ignore guest PAT when this quirk is enabled. KVM is able to safely honor guest PAT on Intel platforms when CPU feature self-snoop is supported. However, KVM honoring guest PAT was reverted after commit 9d70f3fec144 ("Revert "KVM: VMX: Always honor guest PAT on CPUs that support self-snoop""), due to UC access on certain Intel platforms being very slow [1]. Honoring guest PAT on those platforms may break some old guests that accidentally specify PAT as UC. Those old guests may never expect the slowness since KVM always forces WB previously. See [2]. So, introduce an Intel specific quirk KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT. KVM enables the quirk on all Intel platforms by default to avoid breaking old unmodifiable guests. Newer userspace can disable this quirk to turn on honoring guest PAT. The quirk is only valid on Intel's platforms and is absent on AMD's platforms as KVM always honors guest PAT when running on AMD. Suggested-by: Paolo Bonzini Suggested-by: Sean Christopherson Cc: Kevin Tian Signed-off-by: Yan Zhao Link: https://lore.kernel.org/all/Ztl9NWCOupNfVaCA@yzhao56-desk.sh.intel.com # [1] Link: https://lore.kernel.org/all/87jzfutmfc.fsf@redhat.com # [2] Message-ID: <20250224070946.31482-1-yan.y.zhao@intel.com> Signed-off-by: Paolo Bonzini --- Documentation/virt/kvm/api.rst | 22 +++++++++++++++++++ arch/x86/include/uapi/asm/kvm.h | 1 + arch/x86/kvm/mmu.h | 2 +- arch/x86/kvm/mmu/mmu.c | 11 ++++++---- arch/x86/kvm/svm/svm.c | 1 + arch/x86/kvm/vmx/vmx.c | 39 +++++++++++++++++++++++++++------ arch/x86/kvm/x86.c | 2 +- 7 files changed, 65 insertions(+), 13 deletions(-) diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index 2d75edc9db4f..1f13e47a65fa 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -8157,6 +8157,28 @@ KVM_X86_QUIRK_STUFF_FEATURE_MSRS By default, at vCPU creation, KVM sets the and 0x489), as KVM does now allow them to be set by userspace (KVM sets them based on guest CPUID, for safety purposes). + +KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT By default, on Intel platforms, KVM ignores + guest PAT and forces the effective memory + type to WB in EPT. The quirk is not available + on Intel platforms which are incapable of + safely honoring guest PAT (i.e., without CPU + self-snoop, KVM always ignores guest PAT and + forces effective memory type to WB). It is + also ignored on AMD platforms or, on Intel, + when a VM has non-coherent DMA devices + assigned; KVM always honors guest PAT in + such case. The quirk is needed to avoid + slowdowns on certain Intel Xeon platforms + (e.g. ICX, SPR) where self-snoop feature is + supported but UC is slow enough to cause + issues with some older guests that use + UC instead of WC to map the video RAM. + Userspace can disable the quirk to honor + guest PAT if it knows that there is no such + guest software, for example if it does not + expose a bochs graphics device (which is + known to have had a buggy driver). =================================== ============================================ 7.32 KVM_CAP_MAX_VCPU_ID diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h index 89cc7a18ef45..db55a70e173c 100644 --- a/arch/x86/include/uapi/asm/kvm.h +++ b/arch/x86/include/uapi/asm/kvm.h @@ -441,6 +441,7 @@ struct kvm_sync_regs { #define KVM_X86_QUIRK_MWAIT_NEVER_UD_FAULTS (1 << 6) #define KVM_X86_QUIRK_SLOT_ZAP_ALL (1 << 7) #define KVM_X86_QUIRK_STUFF_FEATURE_MSRS (1 << 8) +#define KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT (1 << 9) #define KVM_STATE_NESTED_FORMAT_VMX 0 #define KVM_STATE_NESTED_FORMAT_SVM 1 diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h index 47e64a3c4ce3..f999c15d8d3e 100644 --- a/arch/x86/kvm/mmu.h +++ b/arch/x86/kvm/mmu.h @@ -232,7 +232,7 @@ static inline u8 permission_fault(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu, return -(u32)fault & errcode; } -bool kvm_mmu_may_ignore_guest_pat(void); +bool kvm_mmu_may_ignore_guest_pat(struct kvm *kvm); int kvm_mmu_post_init_vm(struct kvm *kvm); void kvm_mmu_pre_destroy_vm(struct kvm *kvm); diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index e6eb3a262f8d..bcf395d7ec53 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -4663,17 +4663,20 @@ static int kvm_tdp_mmu_page_fault(struct kvm_vcpu *vcpu, } #endif -bool kvm_mmu_may_ignore_guest_pat(void) +bool kvm_mmu_may_ignore_guest_pat(struct kvm *kvm) { /* * When EPT is enabled (shadow_memtype_mask is non-zero), and the VM * has non-coherent DMA (DMA doesn't snoop CPU caches), KVM's ABI is to * honor the memtype from the guest's PAT so that guest accesses to * memory that is DMA'd aren't cached against the guest's wishes. As a - * result, KVM _may_ ignore guest PAT, whereas without non-coherent DMA, - * KVM _always_ ignores guest PAT (when EPT is enabled). + * result, KVM _may_ ignore guest PAT, whereas without non-coherent DMA. + * KVM _always_ ignores guest PAT, when EPT is enabled and when quirk + * KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT is enabled or the CPU lacks the + * ability to safely honor guest PAT. */ - return shadow_memtype_mask; + return shadow_memtype_mask && + kvm_check_has_quirk(kvm, KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT); } int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index ebaa5a41db07..2254dbebddac 100644 --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -5426,6 +5426,7 @@ static __init int svm_hardware_setup(void) */ allow_smaller_maxphyaddr = !npt_enabled; + kvm_caps.inapplicable_quirks |= KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT; return 0; err: diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 75df4caea2f7..5365efb22e96 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -7599,6 +7599,33 @@ int vmx_vm_init(struct kvm *kvm) return 0; } +/* + * Ignore guest PAT when the CPU doesn't support self-snoop to safely honor + * guest PAT, or quirk KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT is turned on. Always + * honor guest PAT when there's non-coherent DMA device attached. + * + * Honoring guest PAT means letting the guest control memory types. + * - On Intel CPUs that lack self-snoop feature, honoring guest PAT may result + * in unexpected behavior. So always ignore guest PAT on those CPUs. + * + * - KVM's ABI is to trust the guest for attached non-coherent DMA devices to + * function correctly (non-coherent DMA devices need the guest to flush CPU + * caches properly). So honoring guest PAT to avoid breaking existing ABI. + * + * - On certain Intel CPUs (e.g. SPR, ICX), though self-snoop feature is + * supported, UC is slow enough to cause issues with some older guests (e.g. + * an old version of bochs driver uses ioremap() instead of ioremap_wc() to + * map the video RAM, causing wayland desktop to fail to get started + * correctly). To avoid breaking those old guests that rely on KVM to force + * memory type to WB, only honoring guest PAT when quirk + * KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT is disabled. + */ +static inline bool vmx_ignore_guest_pat(struct kvm *kvm) +{ + return !kvm_arch_has_noncoherent_dma(kvm) && + kvm_check_has_quirk(kvm, KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT); +} + u8 vmx_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio) { /* @@ -7608,13 +7635,8 @@ u8 vmx_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio) if (is_mmio) return MTRR_TYPE_UNCACHABLE << VMX_EPT_MT_EPTE_SHIFT; - /* - * Force WB and ignore guest PAT if the VM does NOT have a non-coherent - * device attached. Letting the guest control memory types on Intel - * CPUs may result in unexpected behavior, and so KVM's ABI is to trust - * the guest to behave only as a last resort. - */ - if (!kvm_arch_has_noncoherent_dma(vcpu->kvm)) + /* Force WB if ignoring guest PAT */ + if (vmx_ignore_guest_pat(vcpu->kvm)) return (MTRR_TYPE_WRBACK << VMX_EPT_MT_EPTE_SHIFT) | VMX_EPT_IPAT_BIT; return (MTRR_TYPE_WRBACK << VMX_EPT_MT_EPTE_SHIFT); @@ -8506,6 +8528,9 @@ __init int vmx_hardware_setup(void) kvm_set_posted_intr_wakeup_handler(pi_wakeup_handler); + /* Must use WB if the CPU does not have self-snoop. */ + if (!static_cpu_has(X86_FEATURE_SELFSNOOP)) + kvm_caps.supported_quirks &= ~KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT; kvm_caps.inapplicable_quirks = KVM_X86_QUIRK_CD_NW_CLEARED; return r; } diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index a97e58916b6a..b221f273ec77 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -13544,7 +13544,7 @@ static void kvm_noncoherent_dma_assignment_start_or_stop(struct kvm *kvm) * (or last) non-coherent device is (un)registered to so that new SPTEs * with the correct "ignore guest PAT" setting are created. */ - if (kvm_mmu_may_ignore_guest_pat()) + if (kvm_mmu_may_ignore_guest_pat(kvm)) kvm_zap_gfn_range(kvm, gpa_to_gfn(0), gpa_to_gfn(~0ULL)); } From patchwork Sat Mar 1 07:34:28 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Bonzini X-Patchwork-Id: 13997391 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8A5631CAA6E for ; Sat, 1 Mar 2025 07:34:38 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740814480; cv=none; b=U1KeDFWQlT/KLcZd47EMdooI5SOnipyGdpEhyY5+K7ssf5gV+rDfABrXV5Jca5AlBLWWmLQdDlPAHv+m9ALSq0QRSECfGbpGuurVtWiBoPYFQGsaIAjF8rh15lG1AWlvTxrJsYVmgg6l4xBtkYRt0cgf9I5Tl9U49kvJhAAQHTQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740814480; c=relaxed/simple; bh=ZhEYDudAwtNu6wyUKSDvWNQiU78WezOh8CbVFYjQCr0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=AGJlnOyvyUH7Y4UCxwc4QnjX9UXiWP1AKUcyxCUScRC4rhw2BVDHDG2CvhLHLZqiL54yUO5ZvlhQgNO7rpaalCXCnl4Xn1grp2EsHZoCm4n1y5wHro5O+4WFfYKCfqbohQcM5YIQ7zt6WBuF1zTSagXZSKkpp+rVRZcEdrpkug8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=dP6OTjYm; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="dP6OTjYm" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1740814477; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=1lv09K7+u/963LMF8vKcIAXQsg8E39PLDPVoO5qmMfA=; b=dP6OTjYmCQ6DsSOycBCxMB2OA6AIyIVEF+OBZ4RwpC/gv+vylILAEtjXtobLKQp6bKihwP 9Vb4r/Z+YE2fnZLqyFRaa9GREghu+mQHR7QGHLfmBovfvLQj94VAQm9Wz65Sn/MpWBnAR2 50XuM6VwTc/sIL4ramdS2l1eUrR22FA= Received: from mx-prod-mc-02.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-16-mJaHT4ZFOfylJutYQDiZ6A-1; Sat, 01 Mar 2025 02:34:35 -0500 X-MC-Unique: mJaHT4ZFOfylJutYQDiZ6A-1 X-Mimecast-MFC-AGG-ID: mJaHT4ZFOfylJutYQDiZ6A_1740814474 Received: from mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.15]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-02.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id A529019540EE; Sat, 1 Mar 2025 07:34:34 +0000 (UTC) Received: from virtlab1023.lab.eng.rdu2.redhat.com (virtlab1023.lab.eng.rdu2.redhat.com [10.8.1.187]) by mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id E439119560B9; Sat, 1 Mar 2025 07:34:33 +0000 (UTC) From: Paolo Bonzini To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: seanjc@google.com, yan.y.zhao@intel.com Subject: [PATCH 4/4] KVM: TDX: Always honor guest PAT on TDX enabled platforms Date: Sat, 1 Mar 2025 02:34:28 -0500 Message-ID: <20250301073428.2435768-5-pbonzini@redhat.com> In-Reply-To: <20250301073428.2435768-1-pbonzini@redhat.com> References: <20250301073428.2435768-1-pbonzini@redhat.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.0 on 10.30.177.15 From: Yan Zhao Always honor guest PAT in KVM-managed EPTs on TDX enabled platforms by making self-snoop feature a hard dependency for TDX and making quirk KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT not a valid quirk once TDX is enabled. The quirk KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT only affects memory type of KVM-managed EPTs. For the TDX-module-managed private EPT, memory type is always forced to WB now. Honoring guest PAT in KVM-managed EPTs ensures KVM does not invoke kvm_zap_gfn_range() when attaching/detaching non-coherent DMA devices; this would cause mirrored EPTs for TDs to be zapped, as well as incorrect zapping of the private EPT that is managed by the TDX module. As a new platform, TDX always comes with self-snoop feature supported and has no worry to break old not-well-written yet unmodifiable guests. So, simply force-disable the KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT quirk for TDX VMs. Suggested-by: Sean Christopherson Signed-off-by: Yan Zhao Message-ID: <20250224071039.31511-1-yan.y.zhao@intel.com> [Use disabled_quirks instead of supported_quirks. - Paolo] Signed-off-by: Paolo Bonzini --- arch/x86/kvm/vmx/tdx.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index b6f6f6e2f02e..4450fd99cb4c 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -624,6 +624,7 @@ int tdx_vm_init(struct kvm *kvm) kvm->arch.has_protected_state = true; kvm->arch.has_private_mem = true; + kvm->arch.disabled_quirks |= KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT; /* * Because guest TD is protected, VMM can't parse the instruction in TD. @@ -3470,6 +3471,11 @@ int __init tdx_bringup(void) goto success_disable_tdx; } + if (!cpu_feature_enabled(X86_FEATURE_SELFSNOOP)) { + pr_err("Self-snoop is required for TDX\n"); + goto success_disable_tdx; + } + if (!cpu_feature_enabled(X86_FEATURE_TDX_HOST_PLATFORM)) { pr_err("tdx: no TDX private KeyIDs available\n"); goto success_disable_tdx;