From patchwork Wed Mar 13 12:24:52 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: George Dunlap X-Patchwork-Id: 13591362 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 72571C54E67 for ; Wed, 13 Mar 2024 12:28:44 +0000 (UTC) Received: from list by lists.xenproject.org with outflank-mailman.692403.1079499 (Exim 4.92) (envelope-from ) id 1rkNil-0005vL-If; Wed, 13 Mar 2024 12:28:35 +0000 X-Outflank-Mailman: Message body and most headers restored to incoming version Received: by outflank-mailman (output) from mailman id 692403.1079499; Wed, 13 Mar 2024 12:28:35 +0000 Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1rkNil-0005vE-Ft; Wed, 13 Mar 2024 12:28:35 +0000 Received: by outflank-mailman (input) for mailman id 692403; Wed, 13 Mar 2024 12:28:33 +0000 Received: from se1-gles-sth1-in.inumbo.com ([159.253.27.254] helo=se1-gles-sth1.inumbo.com) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1rkNij-0005e0-Ls for xen-devel@lists.xenproject.org; Wed, 13 Mar 2024 12:28:33 +0000 Received: from mail-oi1-x22f.google.com (mail-oi1-x22f.google.com [2607:f8b0:4864:20::22f]) by se1-gles-sth1.inumbo.com (Halon) with ESMTPS id 34356dca-e135-11ee-afdd-a90da7624cb6; Wed, 13 Mar 2024 13:28:32 +0100 (CET) Received: by mail-oi1-x22f.google.com with SMTP id 5614622812f47-3c21a3120feso2612387b6e.3 for ; Wed, 13 Mar 2024 05:28:32 -0700 (PDT) Received: from georged-x-u.eng.citrite.net ([185.25.67.249]) by smtp.gmail.com with ESMTPSA id ne7-20020a056214424700b00690dbc390dcsm2283874qvb.89.2024.03.13.05.28.30 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 13 Mar 2024 05:28:31 -0700 (PDT) X-BeenThere: xen-devel@lists.xenproject.org List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xenproject.org Precedence: list Sender: "Xen-devel" X-Inumbo-ID: 34356dca-e135-11ee-afdd-a90da7624cb6 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cloud.com; s=cloud; t=1710332911; x=1710937711; darn=lists.xenproject.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=GCwOpLYs0LxuV2LN5cgSdDy/7vlzfUoEPoHbwcZNVf0=; b=XG/MOe8ajsSjz5XAY63g5SIBTY3OkiyOXV/VXim/kI1Kl4M6Vs37ywG5Sr8d3v77xi Bil4WOv/6hdbh2gqmtj+N2PFVEPpdpmi7/VlbH898HXSRWp9DwyqkbH/0U/eThpJmt5K C6Cr48dNj3zP6nPDIkT0wOFXAiwgtIG5m2OsU= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1710332911; x=1710937711; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=GCwOpLYs0LxuV2LN5cgSdDy/7vlzfUoEPoHbwcZNVf0=; b=USOMqE3OJB216yc1cueE3q1Ec9T092w6BB7L57hOIlgF4KmlKbyBTD4v4eaOCQWhfe +ALshh9taJCcaPmZZGFyWwcBeuZPdou8nt9LErnekkB1nTlOr3C0DPvVYm90ygdSNyAN fesgDor5xp0wjuy8nMV74/kP2+n7Ph5O6MD4hCpgJWTag3onquEXm9LdTW1wLO7l+XuA RJSQ5n+Oc1lSS2YFvQEMc02a3NDmc2nEyiR9gD1krXP9mcwFl77UE80oasI98nodeoWZ z1cjXPf69c/eM/x0nD9MF6SB3fKgjHp2ZyQEud6k6yrf95pKw4T5yHB6wuhtA0JhtFJL Soyg== X-Gm-Message-State: AOJu0YzO4TUUOmZ4r9xDBfBkasMKchsWkC12ft7d+apqaTT0ho4Ud4L2 8HVV6Mgqa+UEUFN1X2tpbSA+H4WewD+Smj2XamWsp9sMd3p8sKa4giFRmuWRJBEKNsHSK+5kZxU / X-Google-Smtp-Source: AGHT+IHR9DN+7lcvO4/7WeenkBKRhBNcXU7Vg81kDU+bivxwGjowDiMk22Rx8pE3AMmCbhgbrwU/wg== X-Received: by 2002:a05:6808:3021:b0:3c2:1881:f015 with SMTP id ay33-20020a056808302100b003c21881f015mr14877332oib.15.1710332911502; Wed, 13 Mar 2024 05:28:31 -0700 (PDT) From: George Dunlap To: xen-devel@lists.xenproject.org Cc: George Dunlap Subject: [PATCH v2 1/3] x86: Move SVM features exposed to guest into hvm_max_cpu_policy Date: Wed, 13 Mar 2024 12:24:52 +0000 Message-Id: <20240313122454.965566-2-george.dunlap@cloud.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20240313122454.965566-1-george.dunlap@cloud.com> References: <20240313122454.965566-1-george.dunlap@cloud.com> MIME-Version: 1.0 Currently (nested) SVM features we're willing to expose to the guest are defined in calculate_host_policy, and stored in host_cpu_policy. This is the wrong place for this; move it into calculate_hvm_max_policy(), and store it in hvm_max_cpu_policy. Signed-off-by: George Dunlap Reviewed-by: Jan Beulich --- v2: - New --- xen/arch/x86/cpu-policy.c | 30 +++++++++++++++++------------- 1 file changed, 17 insertions(+), 13 deletions(-) diff --git a/xen/arch/x86/cpu-policy.c b/xen/arch/x86/cpu-policy.c index 2acc27632f..bd047456eb 100644 --- a/xen/arch/x86/cpu-policy.c +++ b/xen/arch/x86/cpu-policy.c @@ -398,19 +398,6 @@ static void __init calculate_host_policy(void) if ( vpmu_mode == XENPMU_MODE_OFF ) p->basic.raw[0xa] = EMPTY_LEAF; - if ( p->extd.svm ) - { - /* Clamp to implemented features which require hardware support. */ - p->extd.raw[0xa].d &= ((1u << SVM_FEATURE_NPT) | - (1u << SVM_FEATURE_LBRV) | - (1u << SVM_FEATURE_NRIPS) | - (1u << SVM_FEATURE_PAUSEFILTER) | - (1u << SVM_FEATURE_DECODEASSISTS)); - /* Enable features which are always emulated. */ - p->extd.raw[0xa].d |= ((1u << SVM_FEATURE_VMCBCLEAN) | - (1u << SVM_FEATURE_TSCRATEMSR)); - } - /* 0x000000ce MSR_INTEL_PLATFORM_INFO */ /* probe_cpuid_faulting() sanity checks presence of MISC_FEATURES_ENABLES */ p->platform_info.cpuid_faulting = cpu_has_cpuid_faulting; @@ -741,6 +728,23 @@ static void __init calculate_hvm_max_policy(void) if ( !cpu_has_vmx ) __clear_bit(X86_FEATURE_PKS, fs); + /* + * Make adjustments to possible (nested) virtualization features exposed + * to the guest + */ + if ( p->extd.svm ) + { + /* Clamp to implemented features which require hardware support. */ + p->extd.raw[0xa].d &= ((1u << SVM_FEATURE_NPT) | + (1u << SVM_FEATURE_LBRV) | + (1u << SVM_FEATURE_NRIPS) | + (1u << SVM_FEATURE_PAUSEFILTER) | + (1u << SVM_FEATURE_DECODEASSISTS)); + /* Enable features which are always emulated. */ + p->extd.raw[0xa].d |= ((1u << SVM_FEATURE_VMCBCLEAN) | + (1u << SVM_FEATURE_TSCRATEMSR)); + } + guest_common_max_feature_adjustments(fs); guest_common_feature_adjustments(fs); From patchwork Wed Mar 13 12:24:53 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: George Dunlap X-Patchwork-Id: 13591363 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 56FC4C54791 for ; Wed, 13 Mar 2024 12:28:45 +0000 (UTC) Received: from list by lists.xenproject.org with outflank-mailman.692404.1079510 (Exim 4.92) (envelope-from ) id 1rkNin-0006BN-Qn; Wed, 13 Mar 2024 12:28:37 +0000 X-Outflank-Mailman: Message body and most headers restored to incoming version Received: by outflank-mailman (output) from mailman id 692404.1079510; Wed, 13 Mar 2024 12:28:37 +0000 Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1rkNin-0006BE-Lu; Wed, 13 Mar 2024 12:28:37 +0000 Received: by outflank-mailman (input) for mailman id 692404; Wed, 13 Mar 2024 12:28:36 +0000 Received: from se1-gles-flk1-in.inumbo.com ([94.247.172.50] helo=se1-gles-flk1.inumbo.com) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1rkNim-0005zY-34 for xen-devel@lists.xenproject.org; Wed, 13 Mar 2024 12:28:36 +0000 Received: from mail-qv1-xf29.google.com (mail-qv1-xf29.google.com [2607:f8b0:4864:20::f29]) by se1-gles-flk1.inumbo.com (Halon) with ESMTPS id 3509b218-e135-11ee-a1ee-f123f15fe8a2; Wed, 13 Mar 2024 13:28:34 +0100 (CET) Received: by mail-qv1-xf29.google.com with SMTP id 6a1803df08f44-690d75c73f4so5508506d6.2 for ; Wed, 13 Mar 2024 05:28:34 -0700 (PDT) Received: from georged-x-u.eng.citrite.net ([185.25.67.249]) by smtp.gmail.com with ESMTPSA id ne7-20020a056214424700b00690dbc390dcsm2283874qvb.89.2024.03.13.05.28.31 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 13 Mar 2024 05:28:32 -0700 (PDT) X-BeenThere: xen-devel@lists.xenproject.org List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xenproject.org Precedence: list Sender: "Xen-devel" X-Inumbo-ID: 3509b218-e135-11ee-a1ee-f123f15fe8a2 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cloud.com; s=cloud; t=1710332912; x=1710937712; darn=lists.xenproject.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=VNtvCvVKyRw4GXggSPhElrHebwhUc61/SMy+94lsSrg=; b=dHIjsfKcg8niSCouyr/rIMw2kLu9EY3SgnNfAI/8UJZ4Y9eAtHeCXtDv8Jg5CkyaJD Qk5VszEuBpCJunEd0ZPDrkihiLlY+q/Ya+zi9udgDIV41kS8mL97FzyhDmWV4sIShWOd YZXZeTsf9lFiYhHwLNOO9FPtYUP4NUrKKzarg= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1710332912; x=1710937712; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=VNtvCvVKyRw4GXggSPhElrHebwhUc61/SMy+94lsSrg=; b=Qtz9EieZypo4Z5XiV9lOfAP3yuqwOypZ9deQHHtD+mnzoPCqmw3U0ISA7jFL6Jmqda Q7cTi1OsyyovJAwZ3BHorpNcI7o5RfGwG2BCGpZOyvgv4B5pJoCLHRZGQIQ/yNwDt3e2 /3r/I+qcFr0ceHC9kN58JQLsslNc/gPdQ4gQ2eaMZHc+otlgET33uevh6tyEkJnvVbts qK++jcAZMm1Z614lSaz1xXdFZ83Dw0QRCnrfvDMa7pE/iZ4tSbZ7xGIIDZg8DFz93IW8 puhFeXmfGBSrznsjlx0T8djz3H4oDdHJ1Fte77mEwGUVbxliLfjIzBweagPYlw2P/NiB CZgg== X-Gm-Message-State: AOJu0YytvrMmfqVhkFNDgxIl0TrXnjDpSWrKYfjdXAVDehme4P+jwPyr AxKeIZ96etva1HdiNy83db0U8xP8/PJr1lluhseZ8oTgDOOlgKkKmwRVDCRl9hN8+O3Li13CV2H 3 X-Google-Smtp-Source: AGHT+IF/KdWvuyDqVymLJAycdU5FsQxKLPPFfWzpL/H+tkCtimySwSD88mtlTNd7GStPiVdACuS6kQ== X-Received: by 2002:a0c:c304:0:b0:691:1ca9:7e56 with SMTP id f4-20020a0cc304000000b006911ca97e56mr1344540qvi.0.1710332912693; Wed, 13 Mar 2024 05:28:32 -0700 (PDT) From: George Dunlap To: xen-devel@lists.xenproject.org Cc: George Dunlap , Jan Beulich , Andrew Cooper , =?utf-8?q?Roger_Pau_Monn=C3=A9?= , Wei Liu Subject: [PATCH v2 2/3] nestedsvm: Disable TscRateMSR Date: Wed, 13 Mar 2024 12:24:53 +0000 Message-Id: <20240313122454.965566-3-george.dunlap@cloud.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20240313122454.965566-1-george.dunlap@cloud.com> References: <20240313122454.965566-1-george.dunlap@cloud.com> MIME-Version: 1.0 The primary purpose of TSC scaling, from our perspective, is to maintain the fiction of an "invariant TSC" across migrates between platforms with different clock speeds. On AMD, the TscRateMSR CPUID bit is unconditionally enabled in the "host cpuid", even if the hardware doesn't actually support it. According to c/s fd14a1943c4 ("nestedsvm: Support TSC Rate MSR"), testing showed that emulating TSC scaling in an L1 was more expensive than emulating TSC scaling on an L0 (due to extra sets of vmexit / vmenter). However, the current implementation seems to be broken. First of all, the final L2 scaling ratio should be a composition of the L0 scaling ratio and the L1 scaling ratio; there's no indication this is being done anywhere. Secondly, it's not clear that the L1 tsc scaling ratio actually affects the L0 tsc scaling ratio. The stored value (ns_tscratio) is used to affect the tsc *offset*, but doesn't seem to actually be factored into d->hvm.tsc_scaling_ratio. (Which shouldn't be per-domain anyway, but per-vcpu.) Having the *offset* scaled according to the nested scaling without the actual RDTSC itself also being scaled has got to produce inconsistent results. For now, just disable the functionality entirely until we can implement it properly: - Don't set TSCRATEMSR in the host CPUID policy - Remove MSR_AMD64_TSC_RATIO emulation handling, so that the guest guests a #GP if it tries to access them (as it should when TSCRATEMSR is clear) - Remove ns_tscratio from struct nestedhvm, and all code that touches it Unfortunately this means ripping out the scaling calculation stuff as well, since it's only used in the nested case; it's there in the git tree if we need it for reference when we re-introduce it. Signed-off-by: George Dunlap Acked-by: Jan Beulich --- v2: - Port over move to hvm_max_cpu_policy CC: Jan Beulich CC: Andrew Cooper CC: "Roger Pau Monné" CC: Wei Liu --- xen/arch/x86/cpu-policy.c | 3 +- xen/arch/x86/hvm/svm/nestedsvm.c | 2 - xen/arch/x86/hvm/svm/svm.c | 57 -------------------- xen/arch/x86/include/asm/hvm/svm/nestedsvm.h | 5 -- 4 files changed, 1 insertion(+), 66 deletions(-) diff --git a/xen/arch/x86/cpu-policy.c b/xen/arch/x86/cpu-policy.c index bd047456eb..5952ff20e6 100644 --- a/xen/arch/x86/cpu-policy.c +++ b/xen/arch/x86/cpu-policy.c @@ -741,8 +741,7 @@ static void __init calculate_hvm_max_policy(void) (1u << SVM_FEATURE_PAUSEFILTER) | (1u << SVM_FEATURE_DECODEASSISTS)); /* Enable features which are always emulated. */ - p->extd.raw[0xa].d |= ((1u << SVM_FEATURE_VMCBCLEAN) | - (1u << SVM_FEATURE_TSCRATEMSR)); + p->extd.raw[0xa].d |= (1u << SVM_FEATURE_VMCBCLEAN); } guest_common_max_feature_adjustments(fs); diff --git a/xen/arch/x86/hvm/svm/nestedsvm.c b/xen/arch/x86/hvm/svm/nestedsvm.c index e4e01add8c..a5319ab729 100644 --- a/xen/arch/x86/hvm/svm/nestedsvm.c +++ b/xen/arch/x86/hvm/svm/nestedsvm.c @@ -146,8 +146,6 @@ int cf_check nsvm_vcpu_reset(struct vcpu *v) svm->ns_msr_hsavepa = INVALID_PADDR; svm->ns_ovvmcb_pa = INVALID_PADDR; - svm->ns_tscratio = DEFAULT_TSC_RATIO; - svm->ns_cr_intercepts = 0; svm->ns_dr_intercepts = 0; svm->ns_exception_intercepts = 0; diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c index b551eac807..34b9f603bc 100644 --- a/xen/arch/x86/hvm/svm/svm.c +++ b/xen/arch/x86/hvm/svm/svm.c @@ -777,43 +777,6 @@ static int cf_check svm_get_guest_pat(struct vcpu *v, u64 *gpat) return 1; } -static uint64_t scale_tsc(uint64_t host_tsc, uint64_t ratio) -{ - uint64_t mult, frac, scaled_host_tsc; - - if ( ratio == DEFAULT_TSC_RATIO ) - return host_tsc; - - /* - * Suppose the most significant 32 bits of host_tsc and ratio are - * tsc_h and mult, and the least 32 bits of them are tsc_l and frac, - * then - * host_tsc * ratio * 2^-32 - * = host_tsc * (mult * 2^32 + frac) * 2^-32 - * = host_tsc * mult + (tsc_h * 2^32 + tsc_l) * frac * 2^-32 - * = host_tsc * mult + tsc_h * frac + ((tsc_l * frac) >> 32) - * - * Multiplications in the last two terms are between 32-bit integers, - * so both of them can fit in 64-bit integers. - * - * Because mult is usually less than 10 in practice, it's very rare - * that host_tsc * mult can overflow a 64-bit integer. - */ - mult = ratio >> 32; - frac = ratio & ((1ULL << 32) - 1); - scaled_host_tsc = host_tsc * mult; - scaled_host_tsc += (host_tsc >> 32) * frac; - scaled_host_tsc += ((host_tsc & ((1ULL << 32) - 1)) * frac) >> 32; - - return scaled_host_tsc; -} - -static uint64_t svm_get_tsc_offset(uint64_t host_tsc, uint64_t guest_tsc, - uint64_t ratio) -{ - return guest_tsc - scale_tsc(host_tsc, ratio); -} - static void cf_check svm_set_tsc_offset(struct vcpu *v, u64 offset, u64 at_tsc) { struct vmcb_struct *vmcb = v->arch.hvm.svm.vmcb; @@ -832,18 +795,8 @@ static void cf_check svm_set_tsc_offset(struct vcpu *v, u64 offset, u64 at_tsc) if ( nestedhvm_vcpu_in_guestmode(v) ) { - struct nestedsvm *svm = &vcpu_nestedsvm(v); - n2_tsc_offset = vmcb_get_tsc_offset(n2vmcb) - vmcb_get_tsc_offset(n1vmcb); - if ( svm->ns_tscratio != DEFAULT_TSC_RATIO ) - { - uint64_t guest_tsc = hvm_get_guest_tsc_fixed(v, at_tsc); - - n2_tsc_offset = svm_get_tsc_offset(guest_tsc, - guest_tsc + n2_tsc_offset, - svm->ns_tscratio); - } vmcb_set_tsc_offset(n1vmcb, offset); } @@ -1921,10 +1874,6 @@ static int cf_check svm_msr_read_intercept( *msr_content = nsvm->ns_msr_hsavepa; break; - case MSR_AMD64_TSC_RATIO: - *msr_content = nsvm->ns_tscratio; - break; - case MSR_AMD_OSVW_ID_LENGTH: case MSR_AMD_OSVW_STATUS: if ( !d->arch.cpuid->extd.osvw ) @@ -2103,12 +2052,6 @@ static int cf_check svm_msr_write_intercept( goto gpf; break; - case MSR_AMD64_TSC_RATIO: - if ( msr_content & TSC_RATIO_RSVD_BITS ) - goto gpf; - nsvm->ns_tscratio = msr_content; - break; - case MSR_IA32_MCx_MISC(4): /* Threshold register */ case MSR_F10_MC4_MISC1 ... MSR_F10_MC4_MISC3: /* diff --git a/xen/arch/x86/include/asm/hvm/svm/nestedsvm.h b/xen/arch/x86/include/asm/hvm/svm/nestedsvm.h index 406fc082b1..45d658ad01 100644 --- a/xen/arch/x86/include/asm/hvm/svm/nestedsvm.h +++ b/xen/arch/x86/include/asm/hvm/svm/nestedsvm.h @@ -18,11 +18,6 @@ struct nestedsvm { */ uint64_t ns_ovvmcb_pa; - /* virtual tscratio holding the value l1 guest writes to the - * MSR_AMD64_TSC_RATIO MSR. - */ - uint64_t ns_tscratio; - /* Cached real intercepts of the l2 guest */ uint32_t ns_cr_intercepts; uint32_t ns_dr_intercepts; From patchwork Wed Mar 13 12:24:54 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: George Dunlap X-Patchwork-Id: 13591364 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id B6A90C54791 for ; Wed, 13 Mar 2024 12:28:48 +0000 (UTC) Received: from list by lists.xenproject.org with outflank-mailman.692405.1079520 (Exim 4.92) (envelope-from ) id 1rkNiq-0006Xv-2f; Wed, 13 Mar 2024 12:28:40 +0000 X-Outflank-Mailman: Message body and most headers restored to incoming version Received: by outflank-mailman (output) from mailman id 692405.1079520; Wed, 13 Mar 2024 12:28:40 +0000 Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1rkNip-0006Xm-UD; Wed, 13 Mar 2024 12:28:39 +0000 Received: by outflank-mailman (input) for mailman id 692405; Wed, 13 Mar 2024 12:28:39 +0000 Received: from se1-gles-flk1-in.inumbo.com ([94.247.172.50] helo=se1-gles-flk1.inumbo.com) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1rkNip-0005zY-33 for xen-devel@lists.xenproject.org; Wed, 13 Mar 2024 12:28:39 +0000 Received: from mail-vs1-xe30.google.com (mail-vs1-xe30.google.com [2607:f8b0:4864:20::e30]) by se1-gles-flk1.inumbo.com (Halon) with ESMTPS id 36e6b213-e135-11ee-a1ee-f123f15fe8a2; Wed, 13 Mar 2024 13:28:37 +0100 (CET) Received: by mail-vs1-xe30.google.com with SMTP id ada2fe7eead31-4726c259cc9so202522137.0 for ; Wed, 13 Mar 2024 05:28:37 -0700 (PDT) Received: from georged-x-u.eng.citrite.net ([185.25.67.249]) by smtp.gmail.com with ESMTPSA id ne7-20020a056214424700b00690dbc390dcsm2283874qvb.89.2024.03.13.05.28.32 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 13 Mar 2024 05:28:34 -0700 (PDT) X-BeenThere: xen-devel@lists.xenproject.org List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xenproject.org Precedence: list Sender: "Xen-devel" X-Inumbo-ID: 36e6b213-e135-11ee-a1ee-f123f15fe8a2 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cloud.com; s=cloud; t=1710332916; x=1710937716; darn=lists.xenproject.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=NhL6carBpJAEJcZDaFIRwTP2onUiQlD3W4fFGQ9k40Q=; b=dVL7WNClQmrDcfAxtyI++4r5yeaLaZwINuGIy4KawlKNo4x7feTK7l8gf/2hOHhsuB KwdNBPOeu0bIfVBKSx/gvahNVbP6nY4fCeM5/Mym7darsfb1qanzIa9clf2lFSubskoH lUMVJevvqVATb4wrbPzqTw6zF0Nf2o4KlS6kQ= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1710332916; x=1710937716; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=NhL6carBpJAEJcZDaFIRwTP2onUiQlD3W4fFGQ9k40Q=; b=gB5xk5wCc+xQxvZr4Ake+cBPqQDoAv5B0wl2e1oWh23LqsIppK6k8MhISavnG+6+p7 R9XSoPYoQta0enreol7yxz91p+Kumfg7ZYqVQcvYrLNWsP9YgMX7duZ7j7NYAn/hnb0g ilD7NIH89FRLaanBR41UUU+yGfP/r/2RBFrQ2PRMHhBm6QCBKo5fk/NvJ4SbES+Oqa8k Y93/DWkToTfN65wTUnKIjuB9irXeGEeErp1oAdHpLyCZisWPuH0T0ZWEWCDEeeER4+V6 IbpiCZa+c7thiU/2QBev39oHNMKWY3FiWaRXHJFcp5AtmfFQYE9p6xNx6n/UaQ000/BJ zvkw== X-Gm-Message-State: AOJu0Yzw4VMbkZxoeSGVH/PdgkHbm4nM5C6QHN7v8DEH0yAOUj+tWIkP apULcm2G2gJdQl6iTefzqWuG45C3YUmmJ0kaBiKD6cEcmM5IgV/vJFzIf4TkltJXSmmXrLwtYsU l X-Google-Smtp-Source: AGHT+IFFv7kFFTxdBKMJcSkfT8J6KHRXut4JimbWmQW5uk6S+fSYjizgtJgRKE7DIltJJDUa9uQdtQ== X-Received: by 2002:a05:6102:11ed:b0:473:4d73:5e1c with SMTP id e13-20020a05610211ed00b004734d735e1cmr3528715vsg.9.1710332914410; Wed, 13 Mar 2024 05:28:34 -0700 (PDT) From: George Dunlap To: xen-devel@lists.xenproject.org Cc: George Dunlap , Andrew Cooper , George Dunlap , Jan Beulich , Julien Grall , Stefano Stabellini , Wei Liu , =?utf-8?q?Roger_Pau_Monn=C3=A9?= , Jun Nakajima , Kevin Tian Subject: [PATCH v2 3/3] svm/nestedsvm: Introduce nested capabilities bit Date: Wed, 13 Mar 2024 12:24:54 +0000 Message-Id: <20240313122454.965566-4-george.dunlap@cloud.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20240313122454.965566-1-george.dunlap@cloud.com> References: <20240313122454.965566-1-george.dunlap@cloud.com> MIME-Version: 1.0 In order to make implementation and testing tractable, we will require specific host functionality. Add a nested_virt bit to hvm_funcs.caps, and return an error if a domain is created with nested virt and this bit isn't set. Create VMX and SVM callbacks to be executed from start_nested_svm(), which is guaranteed to execute after all command-line options have been procesed. For VMX, start with always enabling it if HAP is present; this shouldn't change current behvior. For SVM, require some basic functionality, adding a document explaining the rationale. NB that only SVM CPUID bits 0-7 have been considered. Bits 10-16 may be considered in a follow-up patch. Signed-off-by: George Dunlap Acked-by: Jan Beulich --- v2: - Fixed typo in title - Added hvm_nested_virt_supported() def for !CONFIG_HVM - Rebased over previous changes - Tweak some wording in document - Require npt rather than nrips twice - Remove stray __init from header - Set caps.nested_virt from callback from nestedhvm_setup() CC: Andrew Cooper CC: George Dunlap CC: Jan Beulich CC: Julien Grall CC: Stefano Stabellini CC: Wei Liu CC: "Roger Pau Monné" CC: Jun Nakajima CC: Kevin Tian --- docs/designs/nested-svm-cpu-features.md | 111 +++++++++++++++++++++++ xen/arch/x86/domain.c | 6 ++ xen/arch/x86/hvm/nestedhvm.c | 10 ++ xen/arch/x86/hvm/svm/nestedsvm.c | 14 +++ xen/arch/x86/hvm/vmx/vvmx.c | 8 ++ xen/arch/x86/include/asm/hvm/hvm.h | 16 +++- xen/arch/x86/include/asm/hvm/nestedhvm.h | 4 + 7 files changed, 168 insertions(+), 1 deletion(-) diff --git a/docs/designs/nested-svm-cpu-features.md b/docs/designs/nested-svm-cpu-features.md new file mode 100644 index 0000000000..837a96df05 --- /dev/null +++ b/docs/designs/nested-svm-cpu-features.md @@ -0,0 +1,111 @@ +# Nested SVM (AMD) CPUID requirements + +The first step in making nested SVM production-ready is to make sure +that all features are implemented and well-tested. To make this +tractable, we will initially be limiting the "supported" range of +nested virt to a specific subset of host and guest features. This +document describes the criteria for deciding on features, and the +rationale behind each feature. + +For AMD, all virtualization-related features can be found in CPUID +leaf 8000000A:edx + +# Criteria + +- Processor support: At a minimum we want to support processors from + the last 5 years. All things being equal, we'd prefer to cover + older processors than not. Bits 0:7 were available in the very + earliest processors; and even through bit 15 we should be pretty + good support-wise. + +- Faithfulness to hardware: We need the behavior of the "virtual cpu" + from the L1 hypervisor's perspective to be as close as possible to + the original hardware. In particular, the behavior of the hardware + on error paths 1) is not easy to understand or test, 2) can be the + source of surprising vulnerabiliies. (See XSA-7 for an example of a + case where subtle error-handling differences can open up a privilege + escalation.) We should avoid emulating any bit of the hardware with + complex error paths if we can at all help it. + +- Cost of implementation: We want to minimize the cost of + implementation (where this includes bringing an existing sub-par + implementation up to speed). All things being equal, we'll favor a + configuration which does not require any new implementation. + +- Performance: All things being equal, we'd prefer to choose a set of + L0 / L1 CPUID bits that are faster than slower. + + +# Bits + +- 0 `NP` *Nested Paging*: Required both for L0 and L1. + + Based primarily on faithfulness and performance, as well as + potential cost of implementation. Available on earliest hardware, + so no compatibility issues. + +- 1 `LbrVirt` *LBR / debugging virtualization*: Require for L0 and L1. + + For L0 this is required for performance: There's no way to tell the + guests not to use the LBR-related registers; and if the guest does, + then you have to save and restore all LBR-related registers on + context switch, which is prohibitive. Furthermore, the additional + emulation risks a security-relevant difference to come up. + + Providing it to L1 when we have it in L0 is basically free, and + already implemented. + + Just require it and provide it. + +- 2 `SVML` *SVM Lock*: Not required for L0, not provided to L1 + + Seems to be aboult enabling an operating system to prevent "blue + pill" attacks against itself. + + Xen doesn't use it, nor provide it; so it would need to be + implementend. The best way to protect a guest OS is to leave nested + virt disabled in the tools. + +- 3 `NRIPS` NRIP Save: Require for both L0 and L1 + + If NRIPS is not present, the software interrupt injection + functionality can't be used; and Xen has to emulate it. That's + another source of potential security issues. If hardware supports + it, then providing it to guest is basically free. + +- 4 `TscRateMsr`: Not required by L0, not provided to L1 + + The main putative use for this would be trying to maintain an + invariant TSC across cores with different clock speeds, or after a + migrate. Unlike others, this doesn't have an error path to worry + about compatibility-wise; and according to tests done when nestedSVM + was first implemented, it's actually faster to emliate TscRateMSR in + the L0 hypervisor than for L1 to attempt to emulate it itself. + + However, using this properly in L0 will take some implementation + effort; and composing it properly with L1 will take even more + effort. Just leave it off for now. + + - 5 `VmcbClean`: VMCB Clean Bits: Not required by L0, provide to L1 + + This is a pure optimization, both on the side of the L0 and L1. The + implementaiton for L1 is entirely Xen-side, so can be provided even + on hardware that doesn't provide it. And it's purely an + optimization, so could be "implemented" by ignoring the bits + entirely. + + As such, we don't need to require it for L0; and as it's already + implemented, no reason not to provide it to L1. Before this feature + was available those bits were marked SBZ ("should be zero"); setting + them was already advertised to cause unpredictable behavior. + +- 6 `FlushByAsid`: Require for L0, provide to L1 + + This is cheap and easy to use for L0 and to provide to the L1; + there's no reson not to just pass it through. + +- 7 `DecodeAssists`: Require for L0, provide to L1 + + Using it in L0 reduces the chance that we'll make some sort of error + in the decode path. And if hardware supports it, it's easy enough + to provide to the L1. diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c index bda853e3c9..a25f498265 100644 --- a/xen/arch/x86/domain.c +++ b/xen/arch/x86/domain.c @@ -673,6 +673,12 @@ int arch_sanitise_domain_config(struct xen_domctl_createdomain *config) */ config->flags |= XEN_DOMCTL_CDF_oos_off; + if ( nested_virt && !hvm_nested_virt_supported() ) + { + dprintk(XENLOG_INFO, "Nested virt requested but not available\n"); + return -EINVAL; + } + if ( nested_virt && !hap ) { dprintk(XENLOG_INFO, "Nested virt not supported without HAP\n"); diff --git a/xen/arch/x86/hvm/nestedhvm.c b/xen/arch/x86/hvm/nestedhvm.c index 12bf7172b8..451c4da6d4 100644 --- a/xen/arch/x86/hvm/nestedhvm.c +++ b/xen/arch/x86/hvm/nestedhvm.c @@ -150,6 +150,16 @@ static int __init cf_check nestedhvm_setup(void) __clear_bit(0x80, shadow_io_bitmap[0]); __clear_bit(0xed, shadow_io_bitmap[1]); + /* + * NB this must be called after all command-line processing has been + * done, so that if (for example) HAP is disabled, nested virt is + * disabled as well. + */ + if ( cpu_has_vmx ) + start_nested_vmx(&hvm_funcs); + else if ( cpu_has_svm ) + start_nested_svm(&hvm_funcs); + return 0; } __initcall(nestedhvm_setup); diff --git a/xen/arch/x86/hvm/svm/nestedsvm.c b/xen/arch/x86/hvm/svm/nestedsvm.c index a5319ab729..ad2e9f5c35 100644 --- a/xen/arch/x86/hvm/svm/nestedsvm.c +++ b/xen/arch/x86/hvm/svm/nestedsvm.c @@ -1666,3 +1666,17 @@ void svm_nested_features_on_efer_update(struct vcpu *v) } } } + +void __init start_nested_svm(struct hvm_function_table *hvm_function_table) +{ + /* + * Required host functionality to support nested virt. See + * docs/designs/nested-svm-cpu-features.md for rationale. + */ + hvm_function_table->caps.nested_virt = + hvm_function_table->caps.hap && + cpu_has_svm_lbrv && + cpu_has_svm_nrips && + cpu_has_svm_flushbyasid && + cpu_has_svm_decode; +} diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c index ece0aa243a..ed058d9d2b 100644 --- a/xen/arch/x86/hvm/vmx/vvmx.c +++ b/xen/arch/x86/hvm/vmx/vvmx.c @@ -2816,6 +2816,14 @@ void nvmx_set_cr_read_shadow(struct vcpu *v, unsigned int cr) __vmwrite(read_shadow_field, v->arch.hvm.nvcpu.guest_cr[cr]); } +void __init start_nested_vmx(struct hvm_function_table *hvm_function_table) +{ + /* TODO: Require hardware support before enabling nested virt */ + hvm_function_table->caps.nested_virt = + hvm_function_table->caps.hap; +} + + /* * Local variables: * mode: C diff --git a/xen/arch/x86/include/asm/hvm/hvm.h b/xen/arch/x86/include/asm/hvm/hvm.h index 87a6935d97..e6f937fed7 100644 --- a/xen/arch/x86/include/asm/hvm/hvm.h +++ b/xen/arch/x86/include/asm/hvm/hvm.h @@ -97,7 +97,10 @@ struct hvm_function_table { singlestep:1, /* Hardware virtual interrupt delivery enable? */ - virtual_intr_delivery:1; + virtual_intr_delivery:1, + + /* Nested virt capabilities */ + nested_virt:1; } caps; /* @@ -654,6 +657,12 @@ static inline bool hvm_altp2m_supported(void) return hvm_funcs.caps.altp2m; } +/* Returns true if we have the minimum hardware requirements for nested virt */ +static inline bool hvm_nested_virt_supported(void) +{ + return hvm_funcs.caps.nested_virt; +} + /* updates the current hardware p2m */ static inline void altp2m_vcpu_update_p2m(struct vcpu *v) { @@ -797,6 +806,11 @@ static inline bool hvm_hap_supported(void) return false; } +static inline bool hvm_nested_virt_supported(void) +{ + return false; +} + static inline bool nhvm_vmcx_hap_enabled(const struct vcpu *v) { ASSERT_UNREACHABLE(); diff --git a/xen/arch/x86/include/asm/hvm/nestedhvm.h b/xen/arch/x86/include/asm/hvm/nestedhvm.h index 56a2019e1b..0568acb25f 100644 --- a/xen/arch/x86/include/asm/hvm/nestedhvm.h +++ b/xen/arch/x86/include/asm/hvm/nestedhvm.h @@ -82,4 +82,8 @@ static inline bool vvmcx_valid(const struct vcpu *v) return vcpu_nestedhvm(v).nv_vvmcxaddr != INVALID_PADDR; } + +void start_nested_svm(struct hvm_function_table *); +void start_nested_vmx(struct hvm_function_table *); + #endif /* _HVM_NESTEDHVM_H */