From patchwork Wed Jun 22 11:15:30 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Kai" X-Patchwork-Id: 12890531 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 91824C433EF for ; Wed, 22 Jun 2022 11:15:57 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1356882AbiFVLP5 (ORCPT ); Wed, 22 Jun 2022 07:15:57 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55230 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1357028AbiFVLPx (ORCPT ); Wed, 22 Jun 2022 07:15:53 -0400 Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 474603C488; Wed, 22 Jun 2022 04:15:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1655896547; x=1687432547; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=R9Z6rQxB/ViT0g8LFtl7AUDDX/HTHsPS74al1lNK5LQ=; b=I4sL8MMZvkm3wSiCbqed7vwn0CiO1fFIVcaNkg/56s8Mqn70KC9kF7al meEQOtyyYsErrKfoTD0VTf/b2wBQzGEGHaRDy10WaGfmaanDicCt9ieNa haiUnJi22agESyXiuSmmZzk6DNrzpCgpr6ywLx6l9DH9zpG6ZKQavJDEx czqRsh5MI2nuhnOUM1hk4Bet0Bw6wNnFXCbA++SMzyAl9d8nX4iR0B26B kHXlU3PCVPehDRjbqXRdCEuFs6Z/kjxdVbunFoM1gFsfuuGvVVLzAbjZw D0x3PwsRS2MlzZUOS6okm0aIokIgnvf8gOyUQi1wQX7GzvsjWLe3WR887 g==; X-IronPort-AV: E=McAfee;i="6400,9594,10385"; a="280435982" X-IronPort-AV: E=Sophos;i="5.92,212,1650956400"; d="scan'208";a="280435982" Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Jun 2022 04:15:43 -0700 X-IronPort-AV: E=Sophos;i="5.92,212,1650956400"; d="scan'208";a="538433157" Received: from jmatsis-mobl.amr.corp.intel.com (HELO khuang2-desk.gar.corp.intel.com) ([10.209.178.197]) by orsmga003-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Jun 2022 04:15:39 -0700 From: Kai Huang To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: seanjc@google.com, pbonzini@redhat.com, dave.hansen@intel.com, len.brown@intel.com, tony.luck@intel.com, rafael.j.wysocki@intel.com, reinette.chatre@intel.com, dan.j.williams@intel.com, peterz@infradead.org, ak@linux.intel.com, kirill.shutemov@linux.intel.com, sathyanarayanan.kuppuswamy@linux.intel.com, isaku.yamahata@intel.com, kai.huang@intel.com Subject: [PATCH v5 01/22] x86/virt/tdx: Detect TDX during kernel boot Date: Wed, 22 Jun 2022 23:15:30 +1200 Message-Id: <062075b36150b119bf2d0a1262de973b0a2b11a7.1655894131.git.kai.huang@intel.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Intel Trust Domain Extensions (TDX) protects guest VMs from malicious host and certain physical attacks. TDX introduces a new CPU mode called Secure Arbitration Mode (SEAM) and a new isolated range pointed by the SEAM Ranger Register (SEAMRR). A CPU-attested software module called 'the TDX module' runs inside the new isolated range to implement the functionalities to manage and run protected VMs. Pre-TDX Intel hardware has support for a memory encryption architecture called MKTME. The memory encryption hardware underpinning MKTME is also used for Intel TDX. TDX ends up "stealing" some of the physical address space from the MKTME architecture for crypto-protection to VMs. BIOS is responsible for partitioning the "KeyID" space between legacy MKTME and TDX. The KeyIDs reserved for TDX are called 'TDX private KeyIDs' or 'TDX KeyIDs' for short. To enable TDX, BIOS needs to configure SEAMRR (core-scope) and TDX private KeyIDs (package-scope) consistently for all packages. TDX doesn't trust BIOS. TDX ensures all BIOS configurations are correct, and if not, refuses to enable SEAMRR on any core. This means detecting SEAMRR alone on BSP is enough to check whether TDX has been enabled by BIOS. To start to support TDX, create a new arch/x86/virt/vmx/tdx/tdx.c for TDX host kernel support. Add a new Kconfig option CONFIG_INTEL_TDX_HOST to opt-in TDX host kernel support (to distinguish with TDX guest kernel support). So far only KVM is the only user of TDX. Make the new config option depend on KVM_INTEL. Use early_initcall() to detect whether TDX is enabled by BIOS during kernel boot, and add a function to report that. Use a function instead of a new CPU feature bit. This is because the TDX module needs to be initialized before it can be used to run any TDX guests, and the TDX module is initialized at runtime by the caller who wants to use TDX. Explicitly detect SEAMRR but not just only detect TDX private KeyIDs. Theoretically, a misconfiguration of TDX private KeyIDs can result in SEAMRR being disabled, but the BSP can still report the correct TDX KeyIDs. Such BIOS bug can be caught when initializing the TDX module, but it's better to do more detection during boot to provide a more accurate result. Also detect the TDX KeyIDs. This allows userspace to know how many TDX guests the platform can run w/o needing to wait until TDX is fully functional. Signed-off-by: Kai Huang Reviewed-by: Chao Gao --- arch/x86/Kconfig | 13 ++++ arch/x86/Makefile | 2 + arch/x86/include/asm/tdx.h | 7 +++ arch/x86/virt/Makefile | 2 + arch/x86/virt/vmx/Makefile | 2 + arch/x86/virt/vmx/tdx/Makefile | 2 + arch/x86/virt/vmx/tdx/tdx.c | 109 +++++++++++++++++++++++++++++++++ arch/x86/virt/vmx/tdx/tdx.h | 47 ++++++++++++++ 8 files changed, 184 insertions(+) create mode 100644 arch/x86/virt/Makefile create mode 100644 arch/x86/virt/vmx/Makefile create mode 100644 arch/x86/virt/vmx/tdx/Makefile create mode 100644 arch/x86/virt/vmx/tdx/tdx.c create mode 100644 arch/x86/virt/vmx/tdx/tdx.h diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 7021ec725dd3..23f21aa3a5c4 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -1967,6 +1967,19 @@ config X86_SGX If unsure, say N. +config INTEL_TDX_HOST + bool "Intel Trust Domain Extensions (TDX) host support" + default n + depends on CPU_SUP_INTEL + depends on X86_64 + depends on KVM_INTEL + help + Intel Trust Domain Extensions (TDX) protects guest VMs from malicious + host and certain physical attacks. This option enables necessary TDX + support in host kernel to run protected VMs. + + If unsure, say N. + config EFI bool "EFI runtime service support" depends on ACPI diff --git a/arch/x86/Makefile b/arch/x86/Makefile index 63d50f65b828..2ca3a2a36dc5 100644 --- a/arch/x86/Makefile +++ b/arch/x86/Makefile @@ -234,6 +234,8 @@ head-y += arch/x86/kernel/platform-quirks.o libs-y += arch/x86/lib/ +core-y += arch/x86/virt/ + # drivers-y are linked after core-y drivers-$(CONFIG_MATH_EMULATION) += arch/x86/math-emu/ drivers-$(CONFIG_PCI) += arch/x86/pci/ diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h index 020c81a7c729..97511b76c1ac 100644 --- a/arch/x86/include/asm/tdx.h +++ b/arch/x86/include/asm/tdx.h @@ -87,5 +87,12 @@ static inline long tdx_kvm_hypercall(unsigned int nr, unsigned long p1, return -ENODEV; } #endif /* CONFIG_INTEL_TDX_GUEST && CONFIG_KVM_GUEST */ + +#ifdef CONFIG_INTEL_TDX_HOST +bool platform_tdx_enabled(void); +#else /* !CONFIG_INTEL_TDX_HOST */ +static inline bool platform_tdx_enabled(void) { return false; } +#endif /* CONFIG_INTEL_TDX_HOST */ + #endif /* !__ASSEMBLY__ */ #endif /* _ASM_X86_TDX_H */ diff --git a/arch/x86/virt/Makefile b/arch/x86/virt/Makefile new file mode 100644 index 000000000000..1e36502cd738 --- /dev/null +++ b/arch/x86/virt/Makefile @@ -0,0 +1,2 @@ +# SPDX-License-Identifier: GPL-2.0-only +obj-y += vmx/ diff --git a/arch/x86/virt/vmx/Makefile b/arch/x86/virt/vmx/Makefile new file mode 100644 index 000000000000..feebda21d793 --- /dev/null +++ b/arch/x86/virt/vmx/Makefile @@ -0,0 +1,2 @@ +# SPDX-License-Identifier: GPL-2.0-only +obj-$(CONFIG_INTEL_TDX_HOST) += tdx/ diff --git a/arch/x86/virt/vmx/tdx/Makefile b/arch/x86/virt/vmx/tdx/Makefile new file mode 100644 index 000000000000..1bd688684716 --- /dev/null +++ b/arch/x86/virt/vmx/tdx/Makefile @@ -0,0 +1,2 @@ +# SPDX-License-Identifier: GPL-2.0-only +obj-$(CONFIG_INTEL_TDX_HOST) += tdx.o diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c new file mode 100644 index 000000000000..8275007702e6 --- /dev/null +++ b/arch/x86/virt/vmx/tdx/tdx.c @@ -0,0 +1,109 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright(c) 2022 Intel Corporation. + * + * Intel Trusted Domain Extensions (TDX) support + */ + +#define pr_fmt(fmt) "tdx: " fmt + +#include +#include +#include +#include +#include +#include +#include +#include +#include "tdx.h" + +static u32 tdx_keyid_start __ro_after_init; +static u32 tdx_keyid_num __ro_after_init; + +/* Detect whether CPU supports SEAM */ +static int detect_seam(void) +{ + u64 mtrrcap, mask; + + /* SEAMRR is reported via MTRRcap */ + if (!boot_cpu_has(X86_FEATURE_MTRR)) + return -ENODEV; + + rdmsrl(MSR_MTRRcap, mtrrcap); + if (!(mtrrcap & MTRR_CAP_SEAMRR)) + return -ENODEV; + + /* The MASK MSR reports whether SEAMRR is enabled */ + rdmsrl(MSR_IA32_SEAMRR_PHYS_MASK, mask); + if ((mask & SEAMRR_ENABLED_BITS) != SEAMRR_ENABLED_BITS) + return -ENODEV; + + pr_info("SEAMRR enabled.\n"); + return 0; +} + +static int detect_tdx_keyids(void) +{ + u64 keyid_part; + + rdmsrl(MSR_IA32_MKTME_KEYID_PARTITIONING, keyid_part); + + tdx_keyid_num = TDX_KEYID_NUM(keyid_part); + tdx_keyid_start = TDX_KEYID_START(keyid_part); + + pr_info("TDX private KeyID range: [%u, %u).\n", + tdx_keyid_start, tdx_keyid_start + tdx_keyid_num); + + /* + * TDX guarantees at least two TDX KeyIDs are configured by + * BIOS, otherwise SEAMRR is disabled. Invalid TDX private + * range means kernel bug (TDX is broken). + */ + if (WARN_ON(!tdx_keyid_start || tdx_keyid_num < 2)) { + tdx_keyid_start = tdx_keyid_num = 0; + return -EINVAL; + } + + return 0; +} + +/* + * Detect TDX via detecting SEAMRR during kernel boot. + * + * To enable TDX, BIOS must configure SEAMRR consistently across all + * CPU cores. TDX doesn't trust BIOS. Instead, MCHECK verifies all + * configurations from BIOS are correct, and if not, it disables TDX + * (SEAMRR is disabled on all cores). This means detecting SEAMRR on + * BSP is enough to determine whether TDX has been enabled by BIOS. + */ +static int __init tdx_early_detect(void) +{ + int ret; + + ret = detect_seam(); + if (ret) + return ret; + + /* + * TDX private KeyIDs is only accessible by SEAM software. + * Only detect TDX KeyIDs when SEAMRR is enabled. + */ + ret = detect_tdx_keyids(); + if (ret) + return ret; + + pr_info("TDX enabled by BIOS.\n"); + return 0; +} +early_initcall(tdx_early_detect); + +/** + * platform_tdx_enabled() - Return whether BIOS has enabled TDX + * + * Return whether BIOS has enabled TDX regardless whether the TDX module + * has been loaded or not. + */ +bool platform_tdx_enabled(void) +{ + return tdx_keyid_num >= 2; +} diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h new file mode 100644 index 000000000000..f16055cc25f4 --- /dev/null +++ b/arch/x86/virt/vmx/tdx/tdx.h @@ -0,0 +1,47 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _X86_VIRT_TDX_H +#define _X86_VIRT_TDX_H + +#include + +/* + * This file contains both macros and data structures defined by the TDX + * architecture and Linux defined software data structures and functions. + * The two should not be mixed together for better readability. The + * architectural definitions come first. + */ + +/* + * Intel Trusted Domain CPU Architecture Extension spec: + * + * IA32_MTRRCAP: + * Bit 15: The support of SEAMRR + * + * IA32_SEAMRR_PHYS_MASK (core-scope): + * Bit 10: Lock bit + * Bit 11: Enable bit + */ +#define MTRR_CAP_SEAMRR BIT_ULL(15) + +#define MSR_IA32_SEAMRR_PHYS_MASK 0x00001401 + +#define SEAMRR_PHYS_MASK_ENABLED BIT_ULL(11) +#define SEAMRR_PHYS_MASK_LOCKED BIT_ULL(10) +#define SEAMRR_ENABLED_BITS \ + (SEAMRR_PHYS_MASK_ENABLED | SEAMRR_PHYS_MASK_LOCKED) + +/* + * IA32_MKTME_KEYID_PARTIONING: + * Bit [31:0]: Number of MKTME KeyIDs. + * Bit [63:32]: Number of TDX private KeyIDs. + * + * MKTME KeyIDs start from KeyID 1. TDX private KeyIDs start + * after the last MKTME KeyID. + */ +#define MSR_IA32_MKTME_KEYID_PARTITIONING 0x00000087 + +#define TDX_KEYID_START(_keyid_part) \ + ((u32)(((_keyid_part) & 0xffffffffull) + 1)) +#define TDX_KEYID_NUM(_keyid_part) ((u32)((_keyid_part) >> 32)) + +#endif From patchwork Wed Jun 22 11:15:43 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Kai" X-Patchwork-Id: 12890532 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A9AE1C433EF for ; Wed, 22 Jun 2022 11:16:02 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1357028AbiFVLQB (ORCPT ); Wed, 22 Jun 2022 07:16:01 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55316 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1356938AbiFVLP6 (ORCPT ); Wed, 22 Jun 2022 07:15:58 -0400 Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DBEA93A71C; Wed, 22 Jun 2022 04:15:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1655896557; x=1687432557; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=c2cDWRUotXGzh7Jz0IL/B+4+Jjak9b59vmSYGksHapI=; b=I6VfN0N67nn8vryf7sJcbql5OiZJGVINae4aGN+TQlMP8B9kQyX6sWIx ba65nKc/DPSwytvX52L1rhYnAnaEfPfiO2l17lhrbxMfXeTzS6nwW08DN Vnh8DhjnXMGag7cJEmsc6PMU94LDRW2CJDzgGBLTCGTcxdkCtM4frr7eD Oop7GmQSqcpZDV5E55l5C6RKDKUgj7PxMsnVUadznvTaVRsY9MIWUt7Ho QGAM7SA21HAfHAG5tvO3zYSE19ciNRH4Otv2BhB7QSTzQtq49UrmfttJ9 xfXNTt4uC7YVJEs78aWSFUCvnUUu5czTS1Nb5YJNFbLrywAd5zbMDd+mf Q==; X-IronPort-AV: E=McAfee;i="6400,9594,10385"; a="305840749" X-IronPort-AV: E=Sophos;i="5.92,212,1650956400"; d="scan'208";a="305840749" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Jun 2022 04:15:57 -0700 X-IronPort-AV: E=Sophos;i="5.92,212,1650956400"; d="scan'208";a="730301933" Received: from jmatsis-mobl.amr.corp.intel.com (HELO khuang2-desk.gar.corp.intel.com) ([10.209.178.197]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Jun 2022 04:15:52 -0700 From: Kai Huang To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: linux-acpi@vger.kernel.org, seanjc@google.com, pbonzini@redhat.com, dave.hansen@intel.com, len.brown@intel.com, tony.luck@intel.com, rafael.j.wysocki@intel.com, reinette.chatre@intel.com, dan.j.williams@intel.com, peterz@infradead.org, ak@linux.intel.com, kirill.shutemov@linux.intel.com, sathyanarayanan.kuppuswamy@linux.intel.com, isaku.yamahata@intel.com, thomas.lendacky@amd.com, Tianyu.Lan@microsoft.com, rdunlap@infradead.org, Jason@zx2c4.com, juri.lelli@redhat.com, mark.rutland@arm.com, frederic@kernel.org, yuehaibing@huawei.com, dongli.zhang@oracle.com, kai.huang@intel.com Subject: [PATCH v5 02/22] cc_platform: Add new attribute to prevent ACPI CPU hotplug Date: Wed, 22 Jun 2022 23:15:43 +1200 Message-Id: X-Mailer: git-send-email 2.36.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Platforms with confidential computing technology may not support ACPI CPU hotplug when such technology is enabled by the BIOS. Examples include Intel platforms which support Intel Trust Domain Extensions (TDX). If the kernel ever receives ACPI CPU hotplug event, it is likely a BIOS bug. For ACPI CPU hot-add, the kernel should speak out this is a BIOS bug and reject the new CPU. For hot-removal, for simplicity just assume the kernel cannot continue to work normally, and BUG(). Add a new attribute CC_ATTR_ACPI_CPU_HOTPLUG_DISABLED to indicate the platform doesn't support ACPI CPU hotplug, so that kernel can handle ACPI CPU hotplug events for such platform. The existing attribute CC_ATTR_HOTPLUG_DISABLED is for software CPU hotplug thus doesn't fit. In acpi_processor_{add|remove}(), add early check against this attribute and handle accordingly if it is set. Also take this chance to rename existing CC_ATTR_HOTPLUG_DISABLED to CC_ATTR_CPU_HOTPLUG_DISABLED as it is for software CPU hotplug. Signed-off-by: Kai Huang --- arch/x86/coco/core.c | 2 +- drivers/acpi/acpi_processor.c | 23 +++++++++++++++++++++++ include/linux/cc_platform.h | 15 +++++++++++++-- kernel/cpu.c | 2 +- 4 files changed, 38 insertions(+), 4 deletions(-) diff --git a/arch/x86/coco/core.c b/arch/x86/coco/core.c index 4320fadae716..1bde1af75296 100644 --- a/arch/x86/coco/core.c +++ b/arch/x86/coco/core.c @@ -20,7 +20,7 @@ static bool intel_cc_platform_has(enum cc_attr attr) { switch (attr) { case CC_ATTR_GUEST_UNROLL_STRING_IO: - case CC_ATTR_HOTPLUG_DISABLED: + case CC_ATTR_CPU_HOTPLUG_DISABLED: case CC_ATTR_GUEST_MEM_ENCRYPT: case CC_ATTR_MEM_ENCRYPT: return true; diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c index 6737b1cbf6d6..b960db864cd4 100644 --- a/drivers/acpi/acpi_processor.c +++ b/drivers/acpi/acpi_processor.c @@ -15,6 +15,7 @@ #include #include #include +#include #include @@ -357,6 +358,17 @@ static int acpi_processor_add(struct acpi_device *device, struct device *dev; int result = 0; + /* + * If the confidential computing platform doesn't support ACPI + * memory hotplug, the BIOS should never deliver such event to + * the kernel. Report ACPI CPU hot-add as a BIOS bug and ignore + * the new CPU. + */ + if (cc_platform_has(CC_ATTR_ACPI_CPU_HOTPLUG_DISABLED)) { + dev_err(&device->dev, "[BIOS bug]: Platform doesn't support ACPI CPU hotplug. New CPU ignored.\n"); + return -EINVAL; + } + pr = kzalloc(sizeof(struct acpi_processor), GFP_KERNEL); if (!pr) return -ENOMEM; @@ -434,6 +446,17 @@ static void acpi_processor_remove(struct acpi_device *device) if (!device || !acpi_driver_data(device)) return; + /* + * The confidential computing platform is broken if ACPI memory + * hot-removal isn't supported but it happened anyway. Assume + * it's not guaranteed that the kernel can continue to work + * normally. Just BUG(). + */ + if (cc_platform_has(CC_ATTR_ACPI_CPU_HOTPLUG_DISABLED)) { + dev_err(&device->dev, "Platform doesn't support ACPI CPU hotplug. BUG().\n"); + BUG(); + } + pr = acpi_driver_data(device); if (pr->id >= nr_cpu_ids) goto out; diff --git a/include/linux/cc_platform.h b/include/linux/cc_platform.h index 691494bbaf5a..9ce9256facc8 100644 --- a/include/linux/cc_platform.h +++ b/include/linux/cc_platform.h @@ -74,14 +74,25 @@ enum cc_attr { CC_ATTR_GUEST_UNROLL_STRING_IO, /** - * @CC_ATTR_HOTPLUG_DISABLED: Hotplug is not supported or disabled. + * @CC_ATTR_CPU_HOTPLUG_DISABLED: CPU hotplug is not supported or + * disabled. * * The platform/OS is running as a guest/virtual machine does not * support CPU hotplug feature. * * Examples include TDX Guest. */ - CC_ATTR_HOTPLUG_DISABLED, + CC_ATTR_CPU_HOTPLUG_DISABLED, + + /** + * @CC_ATTR_ACPI_CPU_HOTPLUG_DISABLED: ACPI CPU hotplug is not + * supported. + * + * The platform/OS does not support ACPI CPU hotplug. + * + * Examples include TDX platform. + */ + CC_ATTR_ACPI_CPU_HOTPLUG_DISABLED, }; #ifdef CONFIG_ARCH_HAS_CC_PLATFORM diff --git a/kernel/cpu.c b/kernel/cpu.c index edb8c199f6a3..966772cce063 100644 --- a/kernel/cpu.c +++ b/kernel/cpu.c @@ -1191,7 +1191,7 @@ static int cpu_down_maps_locked(unsigned int cpu, enum cpuhp_state target) * If the platform does not support hotplug, report it explicitly to * differentiate it from a transient offlining failure. */ - if (cc_platform_has(CC_ATTR_HOTPLUG_DISABLED)) + if (cc_platform_has(CC_ATTR_CPU_HOTPLUG_DISABLED)) return -EOPNOTSUPP; if (cpu_hotplug_disabled) return -EBUSY; From patchwork Wed Jun 22 11:15:57 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Kai" X-Patchwork-Id: 12890533 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3B6B3C43334 for ; Wed, 22 Jun 2022 11:16:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1357153AbiFVLQN (ORCPT ); Wed, 22 Jun 2022 07:16:13 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55554 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1357117AbiFVLQJ (ORCPT ); Wed, 22 Jun 2022 07:16:09 -0400 Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 29C253BFBC; Wed, 22 Jun 2022 04:16:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1655896567; x=1687432567; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=658/EnJQy8x9AQuuVjfquxiCjCcnIIMedjflGOBbyXE=; b=C6wJ7lMBNa4CT0Gv0OB7XvBTPdXfwC9lwL13uH7nWvzZYfqrmAJbvgeE xrR6kUVLDSqlZIXiYqo7VxSwOJsO22VQQDdIDZJ2gAZAm0SM0MrfhGx2W zZSQsm7iHI2D7PuvrpBXk9vWk/yElettgvYHAz7xYd82+uj+q9l+m6Wnl 7fN5GUnvEyBboKV1aIx7C8hhC03yEdkuFZczDQoedR++Uoc7T/EsVd0bc vZozT2dL/gqZWJR42Tku4ISOsSDQqsBh7rWQ8Uzz67uu7ndiK99UVxuDu 6zvqAP2+t6n8ELSS+NCKf7d2d3Ku0oA5xtT3g5chy3+5NafPodNaqWdIX A==; X-IronPort-AV: E=McAfee;i="6400,9594,10385"; a="277937027" X-IronPort-AV: E=Sophos;i="5.92,212,1650956400"; d="scan'208";a="277937027" Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Jun 2022 04:16:06 -0700 X-IronPort-AV: E=Sophos;i="5.92,212,1650956400"; d="scan'208";a="834065539" Received: from jmatsis-mobl.amr.corp.intel.com (HELO khuang2-desk.gar.corp.intel.com) ([10.209.178.197]) by fmsmga006-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Jun 2022 04:16:03 -0700 From: Kai Huang To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: linux-acpi@vger.kernel.org, seanjc@google.com, pbonzini@redhat.com, dave.hansen@intel.com, len.brown@intel.com, tony.luck@intel.com, rafael.j.wysocki@intel.com, reinette.chatre@intel.com, dan.j.williams@intel.com, peterz@infradead.org, ak@linux.intel.com, kirill.shutemov@linux.intel.com, sathyanarayanan.kuppuswamy@linux.intel.com, isaku.yamahata@intel.com, thomas.lendacky@amd.com, kai.huang@intel.com Subject: [PATCH v5 03/22] cc_platform: Add new attribute to prevent ACPI memory hotplug Date: Wed, 22 Jun 2022 23:15:57 +1200 Message-Id: <87dc19c47bad73509359c8e1e3a81d51d1681e4c.1655894131.git.kai.huang@intel.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Platforms with confidential computing technology may not support ACPI memory hotplug when such technology is enabled by the BIOS. Examples include Intel platforms which support Intel Trust Domain Extensions (TDX). If the kernel ever receives ACPI memory hotplug event, it is likely a BIOS bug. For ACPI memory hot-add, the kernel should speak out this is a BIOS bug and reject the new memory. For hot-removal, for simplicity just assume the kernel cannot continue to work normally, and just BUG(). Add a new attribute CC_ATTR_ACPI_MEMORY_HOTPLUG_DISABLED to indicate the platform doesn't support ACPI memory hotplug, so that kernel can handle ACPI memory hotplug events for such platform. In acpi_memory_device_{add|remove}(), add early check against this attribute and handle accordingly if it is set. Signed-off-by: Kai Huang --- drivers/acpi/acpi_memhotplug.c | 23 +++++++++++++++++++++++ include/linux/cc_platform.h | 10 ++++++++++ 2 files changed, 33 insertions(+) diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c index 24f662d8bd39..94d6354ea453 100644 --- a/drivers/acpi/acpi_memhotplug.c +++ b/drivers/acpi/acpi_memhotplug.c @@ -15,6 +15,7 @@ #include #include #include +#include #include "internal.h" @@ -291,6 +292,17 @@ static int acpi_memory_device_add(struct acpi_device *device, if (!device) return -EINVAL; + /* + * If the confidential computing platform doesn't support ACPI + * memory hotplug, the BIOS should never deliver such event to + * the kernel. Report ACPI CPU hot-add as a BIOS bug and ignore + * the memory device. + */ + if (cc_platform_has(CC_ATTR_ACPI_MEMORY_HOTPLUG_DISABLED)) { + dev_err(&device->dev, "[BIOS bug]: Platform doesn't support ACPI memory hotplug. New memory device ignored.\n"); + return -EINVAL; + } + mem_device = kzalloc(sizeof(struct acpi_memory_device), GFP_KERNEL); if (!mem_device) return -ENOMEM; @@ -334,6 +346,17 @@ static void acpi_memory_device_remove(struct acpi_device *device) if (!device || !acpi_driver_data(device)) return; + /* + * The confidential computing platform is broken if ACPI memory + * hot-removal isn't supported but it happened anyway. Assume + * it is not guaranteed that the kernel can continue to work + * normally. Just BUG(). + */ + if (cc_platform_has(CC_ATTR_ACPI_CPU_HOTPLUG_DISABLED)) { + dev_err(&device->dev, "Platform doesn't support ACPI memory hotplug. BUG().\n"); + BUG(); + } + mem_device = acpi_driver_data(device); acpi_memory_remove_memory(mem_device); acpi_memory_device_free(mem_device); diff --git a/include/linux/cc_platform.h b/include/linux/cc_platform.h index 9ce9256facc8..b831c24bd7f6 100644 --- a/include/linux/cc_platform.h +++ b/include/linux/cc_platform.h @@ -93,6 +93,16 @@ enum cc_attr { * Examples include TDX platform. */ CC_ATTR_ACPI_CPU_HOTPLUG_DISABLED, + + /** + * @CC_ATTR_ACPI_MEMORY_HOTPLUG_DISABLED: ACPI memory hotplug is + * not supported. + * + * The platform/os does not support ACPI memory hotplug. + * + * Examples include TDX platform. + */ + CC_ATTR_ACPI_MEMORY_HOTPLUG_DISABLED, }; #ifdef CONFIG_ARCH_HAS_CC_PLATFORM From patchwork Wed Jun 22 11:16:07 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Kai" X-Patchwork-Id: 12890534 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 364CBC43334 for ; Wed, 22 Jun 2022 11:16:31 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1357175AbiFVLQ3 (ORCPT ); Wed, 22 Jun 2022 07:16:29 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55990 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1357168AbiFVLQY (ORCPT ); Wed, 22 Jun 2022 07:16:24 -0400 Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 572653C49F; Wed, 22 Jun 2022 04:16:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1655896580; x=1687432580; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=DffF+09fSeGG1bAiwSOs2ml2pvVNDElZJnScEzjWCIg=; b=Eah4tEqnLa3/Q9690ZSeut0LQj2IUrw9VPvLCycXWYGoJ6+UA9h2oKVu N+1gRFY1n1OHHHNTf0DzJe4WISElqk7vBsSxtYsomcuTmu1han8C8csQi hcWBkdpkH4JdPsMsYMaG2zFICNALZTimH04ydrZrTn7AhtozKjOjBzvX2 ovFs13qHInDokpLx3+ID8P9N0qNWtSeUZGPbOgkVdJMyCyVDzIKb1aSx7 p/5DaOFsjeHmp/osTF34+Kx2KGU75nB0FVzJA1v9xKStP5Weo+1W6IlBh D/0cdh4K/mKsUsw3uwGMq9TkBc0fpjH5eJf62gWahHWrFAvCuhJfIg1ju A==; X-IronPort-AV: E=McAfee;i="6400,9594,10385"; a="279157309" X-IronPort-AV: E=Sophos;i="5.92,212,1650956400"; d="scan'208";a="279157309" Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Jun 2022 04:16:19 -0700 X-IronPort-AV: E=Sophos;i="5.92,212,1650956400"; d="scan'208";a="677489181" Received: from jmatsis-mobl.amr.corp.intel.com (HELO khuang2-desk.gar.corp.intel.com) ([10.209.178.197]) by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Jun 2022 04:16:15 -0700 From: Kai Huang To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: seanjc@google.com, pbonzini@redhat.com, dave.hansen@intel.com, len.brown@intel.com, tony.luck@intel.com, rafael.j.wysocki@intel.com, reinette.chatre@intel.com, dan.j.williams@intel.com, peterz@infradead.org, ak@linux.intel.com, kirill.shutemov@linux.intel.com, sathyanarayanan.kuppuswamy@linux.intel.com, isaku.yamahata@intel.com, thomas.lendacky@amd.com, Tianyu.Lan@microsoft.com, kai.huang@intel.com Subject: [PATCH v5 04/22] x86/virt/tdx: Prevent ACPI CPU hotplug and ACPI memory hotplug Date: Wed, 22 Jun 2022 23:16:07 +1200 Message-Id: <3a1c9807d8c140bdd550cd5736664f86782cca64.1655894131.git.kai.huang@intel.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Intel Trust Domain Extensions (TDX) protects guest VMs from malicious host and certain physical attacks. To guarantee the security, TDX imposes additional requirements on both CPU and memory. TDX doesn't work with ACPI CPU hotplug. During platform boot, MCHECK verifies all logical CPUs on all packages are TDX compatible. Any hot-added CPU at runtime is not verified thus cannot support TDX. And TDX requires all boot-time verified CPUs being present during machine's runtime, so TDX doesn't support ACPI CPU hot-removal either. TDX doesn't work with ACPI memory hotplug either. TDX also provides increased levels of memory confidentiality and integrity. During platform boot, MCHECK also verifies all TDX-capable memory regions are physically present and meet TDX's security requirements. Any hot-added memory is not verified thus cannot work with TDX. TDX also assumes all TDX-capable memory regions are present during machine's runtime thus it doesn't support ACPI memory removal either. Select ARCH_HAS_CC_PLATFORM when CONFIG_INTEL_TDX_HOST is on. Set CC vendor to CC_VENDOR_INTEL if TDX is enabled by BIOS, and report ACPI CPU hotplug and ACPI memory hotplug attributes as disabled to prevent them. Note TDX does allow CPU to go offline and then to be brought up again, so software CPU hotplug attribute is not reported. Signed-off-by: Kai Huang --- arch/x86/Kconfig | 1 + arch/x86/coco/core.c | 32 +++++++++++++++++++++++++++++++- arch/x86/virt/vmx/tdx/tdx.c | 4 ++++ 3 files changed, 36 insertions(+), 1 deletion(-) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 23f21aa3a5c4..efa830853e98 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -1973,6 +1973,7 @@ config INTEL_TDX_HOST depends on CPU_SUP_INTEL depends on X86_64 depends on KVM_INTEL + select ARCH_HAS_CC_PLATFORM help Intel Trust Domain Extensions (TDX) protects guest VMs from malicious host and certain physical attacks. This option enables necessary TDX diff --git a/arch/x86/coco/core.c b/arch/x86/coco/core.c index 1bde1af75296..e4c9e34c452f 100644 --- a/arch/x86/coco/core.c +++ b/arch/x86/coco/core.c @@ -12,11 +12,14 @@ #include #include +#include +#include static enum cc_vendor vendor __ro_after_init; static u64 cc_mask __ro_after_init; -static bool intel_cc_platform_has(enum cc_attr attr) +#ifdef CONFIG_INTEL_TDX_GUEST +static bool intel_tdx_guest_has(enum cc_attr attr) { switch (attr) { case CC_ATTR_GUEST_UNROLL_STRING_IO: @@ -28,6 +31,33 @@ static bool intel_cc_platform_has(enum cc_attr attr) return false; } } +#endif + +#ifdef CONFIG_INTEL_TDX_HOST +static bool intel_tdx_host_has(enum cc_attr attr) +{ + switch (attr) { + case CC_ATTR_ACPI_CPU_HOTPLUG_DISABLED: + case CC_ATTR_ACPI_MEMORY_HOTPLUG_DISABLED: + return true; + default: + return false; + } +} +#endif + +static bool intel_cc_platform_has(enum cc_attr attr) +{ +#ifdef CONFIG_INTEL_TDX_GUEST + if (boot_cpu_has(X86_FEATURE_TDX_GUEST)) + return intel_tdx_guest_has(attr); +#endif +#ifdef CONFIG_INTEL_TDX_HOST + if (platform_tdx_enabled()) + return intel_tdx_host_has(attr); +#endif + return false; +} /* * SME and SEV are very similar but they are not the same, so there are diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c index 8275007702e6..eb3294bf1b0a 100644 --- a/arch/x86/virt/vmx/tdx/tdx.c +++ b/arch/x86/virt/vmx/tdx/tdx.c @@ -15,6 +15,7 @@ #include #include #include +#include #include "tdx.h" static u32 tdx_keyid_start __ro_after_init; @@ -92,6 +93,9 @@ static int __init tdx_early_detect(void) if (ret) return ret; + /* Set TDX enabled platform as confidential computing platform */ + cc_set_vendor(CC_VENDOR_INTEL); + pr_info("TDX enabled by BIOS.\n"); return 0; } From patchwork Wed Jun 22 11:16:19 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Kai" X-Patchwork-Id: 12890535 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1BC29C433EF for ; Wed, 22 Jun 2022 11:16:47 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1357288AbiFVLQp (ORCPT ); Wed, 22 Jun 2022 07:16:45 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56398 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1357187AbiFVLQj (ORCPT ); Wed, 22 Jun 2022 07:16:39 -0400 Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A018B3BFBC; Wed, 22 Jun 2022 04:16:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1655896592; x=1687432592; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=B8lOyV+NN2DHameRBeRw32eUf0Rp3UclgjoKHzIwykM=; b=DSNsWYHTH9Ojh3t8NGX+dBgFtPQFV8/1sZ92J2yjg9nNcMKcGUTTbowJ CfszV0gjQhz9j1SEWFmTV3cmFEk5Rpt/tEwvxWEZGX3sj2CghdreTib80 8YG28UNaA0P1ahPDShP/+XC0QB+jqCMlXN/V+3VMy2jc37ZVdxDzukJbD GYHTgecvo11V1H7uaWhRD1pHgs1rXo5W3hbNRbDYcXVWINCjQ17NfYoaI 4U5zZOOi93ajvJzAEGkAN/6uxLgEmIfg6APIH9WH4goxmCd8tpfdhSngF pSFiyFSkHHdtaVoGu9D0VoJ5xLeUBO2iFyDuzNA4XAkRvSPce08tjszJm g==; X-IronPort-AV: E=McAfee;i="6400,9594,10385"; a="260820294" X-IronPort-AV: E=Sophos;i="5.92,212,1650956400"; d="scan'208";a="260820294" Received: from orsmga007.jf.intel.com ([10.7.209.58]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Jun 2022 04:16:32 -0700 X-IronPort-AV: E=Sophos;i="5.92,212,1650956400"; d="scan'208";a="585679747" Received: from jmatsis-mobl.amr.corp.intel.com (HELO khuang2-desk.gar.corp.intel.com) ([10.209.178.197]) by orsmga007-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Jun 2022 04:16:28 -0700 From: Kai Huang To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: linux-mm@kvack.org, seanjc@google.com, pbonzini@redhat.com, dave.hansen@intel.com, len.brown@intel.com, tony.luck@intel.com, rafael.j.wysocki@intel.com, reinette.chatre@intel.com, dan.j.williams@intel.com, peterz@infradead.org, ak@linux.intel.com, kirill.shutemov@linux.intel.com, sathyanarayanan.kuppuswamy@linux.intel.com, isaku.yamahata@intel.com, akpm@linux-foundation.org, kai.huang@intel.com Subject: [PATCH v5 05/22] x86/virt/tdx: Prevent hot-add driver managed memory Date: Wed, 22 Jun 2022 23:16:19 +1200 Message-Id: <173e1f9b2348f29e5f7d939855b8dd98625bcb35.1655894131.git.kai.huang@intel.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org TDX provides increased levels of memory confidentiality and integrity. This requires special hardware support for features like memory encryption and storage of memory integrity checksums. Not all memory satisfies these requirements. As a result, the TDX introduced the concept of a "Convertible Memory Region" (CMR). During boot, the firmware builds a list of all of the memory ranges which can provide the TDX security guarantees. The list of these ranges is available to the kernel by querying the TDX module. However those TDX-capable memory regions are not automatically usable to the TDX module. The kernel needs to choose which convertible memory regions to be the TDX-usable memory and pass those regions to the TDX module when initializing the module. Once those ranges are passed to the TDX module, the TDX-usable memory regions are fixed during module's lifetime. To avoid having to modify the page allocator to distinguish TDX and non-TDX memory allocation, this implementation guarantees all pages managed by the page allocator are TDX memory. This means any hot-added memory to the page allocator will break such guarantee thus should be prevented. There are basically two memory hot-add cases that need to be prevented: ACPI memory hot-add and driver managed memory hot-add. However, adding new memory to ZONE_DEVICE should not be prevented as those pages are not managed by the page allocator. Therefore memremap_pages() variants should be allowed although they internally also use memory hotplug functions. ACPI memory hotplug is already prevented. To prevent driver managed memory and still allow memremap_pages() variants to work, add a __weak hook to do arch-specific check in add_memory_resource(). Implement the x86 version to prevent new memory region from being added when TDX is enabled by BIOS. The __weak arch-specific hook is used instead of a new CC_ATTR similar to disable software CPU hotplug. It is because some driver managed memory resources may actually be TDX-capable (such as legacy PMEM, which is underneath indeed RAM), and the arch-specific hook can be further enhanced to allow those when needed. Note arch-specific hook for __remove_memory() is not required. Both ACPI hot-removal and driver managed memory removal cannot reach it. Signed-off-by: Kai Huang --- arch/x86/mm/init_64.c | 21 +++++++++++++++++++++ include/linux/memory_hotplug.h | 2 ++ mm/memory_hotplug.c | 15 +++++++++++++++ 3 files changed, 38 insertions(+) diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c index 96d34ebb20a9..ce89cf88a818 100644 --- a/arch/x86/mm/init_64.c +++ b/arch/x86/mm/init_64.c @@ -55,6 +55,7 @@ #include #include #include +#include #include "mm_internal.h" @@ -972,6 +973,26 @@ int arch_add_memory(int nid, u64 start, u64 size, return add_pages(nid, start_pfn, nr_pages, params); } +int arch_memory_add_precheck(int nid, u64 start, u64 size, mhp_t mhp_flags) +{ + if (!platform_tdx_enabled()) + return 0; + + /* + * TDX needs to guarantee all pages managed by the page allocator + * are TDX memory in order to not have to distinguish TDX and + * non-TDX memory allocation. The kernel needs to pass the + * TDX-usable memory regions to the TDX module when it gets + * initialized. After that, the TDX-usable memory regions are + * fixed. This means any memory hot-add to the page allocator + * will break above guarantee thus should be prevented. + */ + pr_err("Unable to add memory [0x%llx, 0x%llx) on TDX enabled platform.\n", + start, start + size); + + return -EINVAL; +} + static void __meminit free_pagetable(struct page *page, int order) { unsigned long magic; diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h index 1ce6f8044f1e..306ef4ceb419 100644 --- a/include/linux/memory_hotplug.h +++ b/include/linux/memory_hotplug.h @@ -325,6 +325,8 @@ extern int add_memory_resource(int nid, struct resource *resource, extern int add_memory_driver_managed(int nid, u64 start, u64 size, const char *resource_name, mhp_t mhp_flags); +extern int arch_memory_add_precheck(int nid, u64 start, u64 size, + mhp_t mhp_flags); extern void move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn, unsigned long nr_pages, struct vmem_altmap *altmap, int migratetype); diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 416b38ca8def..2ad4b2603c7c 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -1296,6 +1296,17 @@ bool mhp_supports_memmap_on_memory(unsigned long size) IS_ALIGNED(remaining_size, (pageblock_nr_pages << PAGE_SHIFT)); } +/* + * Pre-check whether hot-add memory is allowed before arch_add_memory(). + * + * Arch to provide replacement version if required. + */ +int __weak arch_memory_add_precheck(int nid, u64 start, u64 size, + mhp_t mhp_flags) +{ + return 0; +} + /* * NOTE: The caller must call lock_device_hotplug() to serialize hotplug * and online/offline operations (triggered e.g. by sysfs). @@ -1319,6 +1330,10 @@ int __ref add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags) if (ret) return ret; + ret = arch_memory_add_precheck(nid, start, size, mhp_flags); + if (ret) + return ret; + if (mhp_flags & MHP_NID_IS_MGID) { group = memory_group_find_by_id(nid); if (!group) From patchwork Wed Jun 22 11:16:29 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Kai" X-Patchwork-Id: 12890536 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 583BDCCA47D for ; Wed, 22 Jun 2022 11:17:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1356755AbiFVLRT (ORCPT ); Wed, 22 Jun 2022 07:17:19 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56506 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1357323AbiFVLRB (ORCPT ); Wed, 22 Jun 2022 07:17:01 -0400 Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 799323BFBC; Wed, 22 Jun 2022 04:16:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1655896618; x=1687432618; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=0+Ia/Mrs6xjJ9gyYaF6VIrm6VlKPab2Ql2n9Y0pYkTE=; b=JsRSGjvDMyYMKyUtAZZgMnaL+RLbzUnDNICooEDatr1fcHzagNB5+gMR lUA+aC66nZT28v4iBzKIOM/N83uIGB/2oQcd9P08+jAyzRrdeinJOI5Vu k9+A4lXrMpJV1ioRQ2gipQKGIPvFXTv4ANQCOCrWli2T996v/4afgY2ho nixYeVE4A4ER5tO1jj5U5AeEV7Sfvbxssr0uW4nkQZdBh5u1RhNZRVnIf 99fI82vs7kcV8GB4xwlmVeuQX/8xefxFCqvy1UIxkktSyN2hOnnk2Z7p/ dFgBn97BdoTvY0xPVb8pr/b+xecYdYhHXu5PPrYbdZGHVQAq87c2sSxUE A==; X-IronPort-AV: E=McAfee;i="6400,9594,10385"; a="344379984" X-IronPort-AV: E=Sophos;i="5.92,212,1650956400"; d="scan'208";a="344379984" Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Jun 2022 04:16:58 -0700 X-IronPort-AV: E=Sophos;i="5.92,212,1650956400"; d="scan'208";a="834065728" Received: from jmatsis-mobl.amr.corp.intel.com (HELO khuang2-desk.gar.corp.intel.com) ([10.209.178.197]) by fmsmga006-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Jun 2022 04:16:54 -0700 From: Kai Huang To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: seanjc@google.com, pbonzini@redhat.com, dave.hansen@intel.com, len.brown@intel.com, tony.luck@intel.com, rafael.j.wysocki@intel.com, reinette.chatre@intel.com, dan.j.williams@intel.com, peterz@infradead.org, ak@linux.intel.com, kirill.shutemov@linux.intel.com, sathyanarayanan.kuppuswamy@linux.intel.com, isaku.yamahata@intel.com, kai.huang@intel.com Subject: [PATCH v5 06/22] x86/virt/tdx: Add skeleton to initialize TDX on demand Date: Wed, 22 Jun 2022 23:16:29 +1200 Message-Id: X-Mailer: git-send-email 2.36.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Before the TDX module can be used to create and run TD guests, it must be loaded into the isolated region pointed by the SEAMRR and properly initialized. The TDX module is expected to be loaded by BIOS before booting to the kernel, and the kernel is expected to detect and initialize it. The TDX module can be initialized only once in its lifetime. Instead of always initializing it at boot time, this implementation chooses an on-demand approach to initialize TDX until there is a real need (e.g when requested by KVM). This avoids consuming the memory that must be allocated by kernel and given to the TDX module as metadata (~1/256th of the TDX-usable memory), and also saves the time of initializing the TDX module (and the metadata) when TDX is not used at all. Initializing the TDX module at runtime on-demand also is more flexible to support TDX module runtime updating in the future (after updating the TDX module, it needs to be initialized again). Add a placeholder tdx_init() to detect and initialize the TDX module on demand, with a state machine protected by mutex to support concurrent calls from multiple callers. The TDX module will be initialized in multi-steps defined by the TDX architecture: 1) Global initialization; 2) Logical-CPU scope initialization; 3) Enumerate the TDX module capabilities and platform configuration; 4) Configure the TDX module about usable memory ranges and global KeyID information; 5) Package-scope configuration for the global KeyID; 6) Initialize usable memory ranges based on 4). The TDX module can also be shut down at any time during its lifetime. In case of any error during the initialization process, shut down the module. It's pointless to leave the module in any intermediate state during the initialization. Signed-off-by: Kai Huang Reviewed-by: Chao Gao --- - v3->v5 (no feedback on v4): - Removed the check that SEAMRR and TDX KeyID have been detected on all present cpus. - Removed tdx_detect(). - Added num_online_cpus() to MADT-enabled CPUs check within the CPU hotplug lock and return early with error message. - Improved dmesg printing for TDX module detection and initialization. --- arch/x86/include/asm/tdx.h | 2 + arch/x86/virt/vmx/tdx/tdx.c | 153 ++++++++++++++++++++++++++++++++++++ 2 files changed, 155 insertions(+) diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h index 97511b76c1ac..801f6e10b2db 100644 --- a/arch/x86/include/asm/tdx.h +++ b/arch/x86/include/asm/tdx.h @@ -90,8 +90,10 @@ static inline long tdx_kvm_hypercall(unsigned int nr, unsigned long p1, #ifdef CONFIG_INTEL_TDX_HOST bool platform_tdx_enabled(void); +int tdx_init(void); #else /* !CONFIG_INTEL_TDX_HOST */ static inline bool platform_tdx_enabled(void) { return false; } +static inline int tdx_init(void) { return -ENODEV; } #endif /* CONFIG_INTEL_TDX_HOST */ #endif /* !__ASSEMBLY__ */ diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c index eb3294bf1b0a..1f9d8108eeea 100644 --- a/arch/x86/virt/vmx/tdx/tdx.c +++ b/arch/x86/virt/vmx/tdx/tdx.c @@ -10,17 +10,39 @@ #include #include #include +#include +#include +#include #include #include #include #include +#include #include #include #include "tdx.h" +/* + * TDX module status during initialization + */ +enum tdx_module_status_t { + /* TDX module hasn't been detected and initialized */ + TDX_MODULE_UNKNOWN, + /* TDX module is not loaded */ + TDX_MODULE_NONE, + /* TDX module is initialized */ + TDX_MODULE_INITIALIZED, + /* TDX module is shut down due to initialization error */ + TDX_MODULE_SHUTDOWN, +}; + static u32 tdx_keyid_start __ro_after_init; static u32 tdx_keyid_num __ro_after_init; +static enum tdx_module_status_t tdx_module_status; +/* Prevent concurrent attempts on TDX detection and initialization */ +static DEFINE_MUTEX(tdx_module_lock); + /* Detect whether CPU supports SEAM */ static int detect_seam(void) { @@ -101,6 +123,84 @@ static int __init tdx_early_detect(void) } early_initcall(tdx_early_detect); +/* + * Detect and initialize the TDX module. + * + * Return -ENODEV when the TDX module is not loaded, 0 when it + * is successfully initialized, or other error when it fails to + * initialize. + */ +static int init_tdx_module(void) +{ + /* The TDX module hasn't been detected */ + return -ENODEV; +} + +static void shutdown_tdx_module(void) +{ + /* TODO: Shut down the TDX module */ + tdx_module_status = TDX_MODULE_SHUTDOWN; +} + +static int __tdx_init(void) +{ + int ret; + + /* + * Initializing the TDX module requires running some code on + * all MADT-enabled CPUs. If not all MADT-enabled CPUs are + * online, it's not possible to initialize the TDX module. + * + * For simplicity temporarily disable CPU hotplug to prevent + * any CPU from going offline during the initialization. + */ + cpus_read_lock(); + + /* + * Check whether all MADT-enabled CPUs are online and return + * early with an explicit message so the user can be aware. + * + * Note ACPI CPU hotplug is prevented when TDX is enabled, so + * num_processors always reflects all present MADT-enabled + * CPUs during boot when disabled_cpus is 0. + */ + if (disabled_cpus || num_online_cpus() != num_processors) { + pr_err("Unable to initialize the TDX module when there's offline CPU(s).\n"); + ret = -EINVAL; + goto out; + } + + ret = init_tdx_module(); + if (ret == -ENODEV) { + pr_info("TDX module is not loaded.\n"); + goto out; + } + + /* + * Shut down the TDX module in case of any error during the + * initialization process. It's meaningless to leave the TDX + * module in any middle state of the initialization process. + * + * Shutting down the module also requires running some code on + * all MADT-enabled CPUs. Do it while CPU hotplug is disabled. + * + * Return all errors during initialization as -EFAULT as + * the TDX module is always shut down in such cases. + */ + if (ret) { + pr_info("Failed to initialize TDX module. Shut it down.\n"); + shutdown_tdx_module(); + ret = -EFAULT; + goto out; + } + + pr_info("TDX module initialized.\n"); +out: + cpus_read_unlock(); + + return ret; +} + /** * platform_tdx_enabled() - Return whether BIOS has enabled TDX * @@ -111,3 +211,56 @@ bool platform_tdx_enabled(void) { return tdx_keyid_num >= 2; } + +/** + * tdx_init - Initialize the TDX module + * + * Initialize the TDX module to make it ready to run TD guests. + * + * Caller to make sure all CPUs are online before calling this function. + * CPU hotplug is temporarily disabled internally to prevent any cpu + * from going offline. + * + * This function can be called in parallel by multiple callers. + * + * Return: + * + * * 0: The TDX module has been successfully initialized. + * * -ENODEV: The TDX module is not loaded, or TDX is not supported. + * * -EINVAL: The TDX module cannot be initialized due to certain + * conditions are not met (i.e. when not all MADT-enabled + * CPUs are not online). + * * -EFAULT: Other internal fatal errors, or the TDX module is in + * shutdown mode due to it failed to initialize in previous + * attempts. + */ +int tdx_init(void) +{ + int ret; + + if (!platform_tdx_enabled()) + return -ENODEV; + + mutex_lock(&tdx_module_lock); + + switch (tdx_module_status) { + case TDX_MODULE_UNKNOWN: + ret = __tdx_init(); + break; + case TDX_MODULE_NONE: + ret = -ENODEV; + break; + case TDX_MODULE_INITIALIZED: + ret = 0; + break; + default: + WARN_ON_ONCE(tdx_module_status != TDX_MODULE_SHUTDOWN); + ret = -EFAULT; + break; + } + + mutex_unlock(&tdx_module_lock); + + return ret; +} +EXPORT_SYMBOL_GPL(tdx_init); From patchwork Wed Jun 22 11:16:30 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Kai" X-Patchwork-Id: 12890537 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 91740C433EF for ; Wed, 22 Jun 2022 11:17:22 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1356672AbiFVLRU (ORCPT ); Wed, 22 Jun 2022 07:17:20 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57348 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1357361AbiFVLRC (ORCPT ); Wed, 22 Jun 2022 07:17:02 -0400 Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6F01E3A73C; Wed, 22 Jun 2022 04:17:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1655896621; x=1687432621; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=azk6eOlLWyY3qAb8HMa/IfCadnDkwXnmK4nwWuUEqaM=; b=DsLm1vjlYNGdKZkfLb9HDjGwSFGIiQHrw8Fb5Q2yv8YayFYvlK+73eSV 0LOQIOvoUNQPeARyemSECcsgTMaTDZ7+oDm5Tue98nxVo8ndILYG2Icop Y069vZjcPsoUxrDrEd3y6QPApS8bVYHG9AbaDauQytn3ebT8Azv2zpgrX em9N1gt50jTC8Fl3m5/XE/lOt8qF/XZT9AxuMeIcmcvry7qPtBxwg+Uax N5ly5+KdvhmZBPgT8FJxB4JSjr43CpXcPZZnDwMaTdoKjwm+ZxMkj2A+y yz0SbAxeNx/WBI72cVVO2aYnr9fn1q284CA3bA4ZAGuo+Ou1EwXPlQCE6 w==; X-IronPort-AV: E=McAfee;i="6400,9594,10385"; a="344379990" X-IronPort-AV: E=Sophos;i="5.92,212,1650956400"; d="scan'208";a="344379990" Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Jun 2022 04:17:01 -0700 X-IronPort-AV: E=Sophos;i="5.92,212,1650956400"; d="scan'208";a="834065741" Received: from jmatsis-mobl.amr.corp.intel.com (HELO khuang2-desk.gar.corp.intel.com) ([10.209.178.197]) by fmsmga006-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Jun 2022 04:16:58 -0700 From: Kai Huang To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: seanjc@google.com, pbonzini@redhat.com, dave.hansen@intel.com, len.brown@intel.com, tony.luck@intel.com, rafael.j.wysocki@intel.com, reinette.chatre@intel.com, dan.j.williams@intel.com, peterz@infradead.org, ak@linux.intel.com, kirill.shutemov@linux.intel.com, sathyanarayanan.kuppuswamy@linux.intel.com, isaku.yamahata@intel.com, kai.huang@intel.com Subject: [PATCH v5 07/22] x86/virt/tdx: Implement SEAMCALL function Date: Wed, 22 Jun 2022 23:16:30 +1200 Message-Id: <095e6bbc57b4470e1e9a9104059a5238c9775f00.1655894131.git.kai.huang@intel.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org TDX introduces a new CPU mode: Secure Arbitration Mode (SEAM). This mode runs only the TDX module itself or other code to load the TDX module. The host kernel communicates with SEAM software via a new SEAMCALL instruction. This is conceptually similar to a guest->host hypercall, except it is made from the host to SEAM software instead. The TDX module defines SEAMCALL leaf functions to allow the host to initialize it, and to create and run protected VMs. SEAMCALL leaf functions use an ABI different from the x86-64 system-v ABI. Instead, they share the same ABI with the TDCALL leaf functions. Implement a function __seamcall() to allow the host to make SEAMCALL to SEAM software using the TDX_MODULE_CALL macro which is the common assembly for both SEAMCALL and TDCALL. SEAMCALL instruction causes #GP when SEAMRR isn't enabled, and #UD when CPU is not in VMX operation. The TDX_MODULE_CALL macro doesn't handle SEAMCALL exceptions. Leave to the caller to guarantee those conditions before calling __seamcall(). Signed-off-by: Kai Huang --- - v3 -> v5 (no feedback on v4): - Explicitly tell TDX_SEAMCALL_VMFAILINVALID is returned if the SEAMCALL itself fails. - Improve the changelog. --- arch/x86/virt/vmx/tdx/Makefile | 2 +- arch/x86/virt/vmx/tdx/seamcall.S | 52 ++++++++++++++++++++++++++++++++ arch/x86/virt/vmx/tdx/tdx.h | 11 +++++++ 3 files changed, 64 insertions(+), 1 deletion(-) create mode 100644 arch/x86/virt/vmx/tdx/seamcall.S diff --git a/arch/x86/virt/vmx/tdx/Makefile b/arch/x86/virt/vmx/tdx/Makefile index 1bd688684716..fd577619620e 100644 --- a/arch/x86/virt/vmx/tdx/Makefile +++ b/arch/x86/virt/vmx/tdx/Makefile @@ -1,2 +1,2 @@ # SPDX-License-Identifier: GPL-2.0-only -obj-$(CONFIG_INTEL_TDX_HOST) += tdx.o +obj-$(CONFIG_INTEL_TDX_HOST) += tdx.o seamcall.o diff --git a/arch/x86/virt/vmx/tdx/seamcall.S b/arch/x86/virt/vmx/tdx/seamcall.S new file mode 100644 index 000000000000..f322427e48c3 --- /dev/null +++ b/arch/x86/virt/vmx/tdx/seamcall.S @@ -0,0 +1,52 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#include +#include + +#include "tdxcall.S" + +/* + * __seamcall() - Host-side interface functions to SEAM software module + * (the P-SEAMLDR or the TDX module). + * + * Transform function call register arguments into the SEAMCALL register + * ABI. Return TDX_SEAMCALL_VMFAILINVALID if the SEAMCALL itself fails, + * or the completion status of the SEAMCALL leaf function. Additional + * output operands are saved in @out (if it is provided by caller). + * + *------------------------------------------------------------------------- + * SEAMCALL ABI: + *------------------------------------------------------------------------- + * Input Registers: + * + * RAX - SEAMCALL Leaf number. + * RCX,RDX,R8-R9 - SEAMCALL Leaf specific input registers. + * + * Output Registers: + * + * RAX - SEAMCALL completion status code. + * RCX,RDX,R8-R11 - SEAMCALL Leaf specific output registers. + * + *------------------------------------------------------------------------- + * + * __seamcall() function ABI: + * + * @fn (RDI) - SEAMCALL Leaf number, moved to RAX + * @rcx (RSI) - Input parameter 1, moved to RCX + * @rdx (RDX) - Input parameter 2, moved to RDX + * @r8 (RCX) - Input parameter 3, moved to R8 + * @r9 (R8) - Input parameter 4, moved to R9 + * + * @out (R9) - struct tdx_module_output pointer + * stored temporarily in R12 (not + * used by the P-SEAMLDR or the TDX + * module). It can be NULL. + * + * Return (via RAX) the completion status of the SEAMCALL, or + * TDX_SEAMCALL_VMFAILINVALID. + */ +SYM_FUNC_START(__seamcall) + FRAME_BEGIN + TDX_MODULE_CALL host=1 + FRAME_END + RET +SYM_FUNC_END(__seamcall) diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h index f16055cc25f4..f1a2dfb978b1 100644 --- a/arch/x86/virt/vmx/tdx/tdx.h +++ b/arch/x86/virt/vmx/tdx/tdx.h @@ -2,6 +2,7 @@ #ifndef _X86_VIRT_TDX_H #define _X86_VIRT_TDX_H +#include #include /* @@ -44,4 +45,14 @@ ((u32)(((_keyid_part) & 0xffffffffull) + 1)) #define TDX_KEYID_NUM(_keyid_part) ((u32)((_keyid_part) >> 32)) + +/* + * Do not put any hardware-defined TDX structure representations below this + * comment! + */ + +struct tdx_module_output; +u64 __seamcall(u64 fn, u64 rcx, u64 rdx, u64 r8, u64 r9, + struct tdx_module_output *out); + #endif From patchwork Wed Jun 22 11:16:31 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Kai" X-Patchwork-Id: 12890538 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E7DECCCA47E for ; Wed, 22 Jun 2022 11:17:23 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1357221AbiFVLRX (ORCPT ); Wed, 22 Jun 2022 07:17:23 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57574 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1356235AbiFVLRI (ORCPT ); Wed, 22 Jun 2022 07:17:08 -0400 Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D5F5A3C708; Wed, 22 Jun 2022 04:17:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1655896624; x=1687432624; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=qZm5IL5AXGr30KfU2T1VkeYUTS6Yir7Q2PVC8bKQsro=; b=YqkGB27A9/Lm1NHo8HNwxiptrV+8BqEz+2KMBBqLaAMnUB8C2ewObWcY LlqRM5+7X0obAb1zXDvW9UGNQIGAo+JQNSQSpUOhvDE13/GPY3qsaX88W pay0v20/8RZpMtuTxhoo4+M+ZlBPf6kEBew8969ZXJOgc1KDFz8ONNo4q HjAph8hHOZ9BizzZ9IoyjdG9hPgdhgbRvp3SsyZ/Xf10FspdOzcKX+am9 UrHHop8OEYKOtmy+Qng3oqc5ieG7jqfJn21jZsltcDAsybu2vNMo/I2jQ PhgYCWnFea34LuCYJKfxCSOf6TQLOP8A6aWNqwnnLvoy4KyamsCTRRuXo w==; X-IronPort-AV: E=McAfee;i="6400,9594,10385"; a="344380007" X-IronPort-AV: E=Sophos;i="5.92,212,1650956400"; d="scan'208";a="344380007" Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Jun 2022 04:17:04 -0700 X-IronPort-AV: E=Sophos;i="5.92,212,1650956400"; d="scan'208";a="834065761" Received: from jmatsis-mobl.amr.corp.intel.com (HELO khuang2-desk.gar.corp.intel.com) ([10.209.178.197]) by fmsmga006-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Jun 2022 04:17:01 -0700 From: Kai Huang To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: seanjc@google.com, pbonzini@redhat.com, dave.hansen@intel.com, len.brown@intel.com, tony.luck@intel.com, rafael.j.wysocki@intel.com, reinette.chatre@intel.com, dan.j.williams@intel.com, peterz@infradead.org, ak@linux.intel.com, kirill.shutemov@linux.intel.com, sathyanarayanan.kuppuswamy@linux.intel.com, isaku.yamahata@intel.com, kai.huang@intel.com Subject: [PATCH v5 08/22] x86/virt/tdx: Shut down TDX module in case of error Date: Wed, 22 Jun 2022 23:16:31 +1200 Message-Id: <89fffc70cdbb74c80bb324364b712ec41e5f8b91.1655894131.git.kai.huang@intel.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org TDX supports shutting down the TDX module at any time during its lifetime. After the module is shut down, no further TDX module SEAMCALL leaf functions can be made to the module on any logical cpu. Shut down the TDX module in case of any error during the initialization process. It's pointless to leave the TDX module in some middle state. Shutting down the TDX module requires calling TDH.SYS.LP.SHUTDOWN on all BIOS-enabled CPUs, and the SEMACALL can run concurrently on different CPUs. Implement a mechanism to run SEAMCALL concurrently on all online CPUs and use it to shut down the module. Later logical-cpu scope module initialization will use it too. Also add a wrapper of __seamcall() which additionally prints out the error information if SEAMCALL fails. It will be useful during the TDX module initialization as it provides more error information to the user. SEAMCALL instruction causes #UD if CPU is not in VMX operation (VMXON has been done). So far only KVM supports VMXON. It guarantees all online CPUs are in VMX operation when there's any VM still exists. As so far KVM is also the only user of TDX, choose to just let the caller to guarantee all CPUs are in VMX operation during tdx_init(). Adding the support of VMXON/VMXOFF to the core kernel isn't trivial. In the long term, more kernel components will likely need to use TDX so a reference-based approach to do VMXON/VMXOFF will likely be needed. Signed-off-by: Kai Huang --- - v3 -> v5 (no feedback on v4): - Added a wrapper of __seamcall() to print error code if SEAMCALL fails. - Made the seamcall_on_each_cpu() void. - Removed 'seamcall_ret' and 'tdx_module_out' from 'struct seamcall_ctx', as they must be local variable. - Added the comments to tdx_init() and one paragraph to changelog to explain the caller should handle VMXON. - Called out after shut down, no "TDX module" SEAMCALL can be made. --- arch/x86/virt/vmx/tdx/tdx.c | 65 ++++++++++++++++++++++++++++++++++++- arch/x86/virt/vmx/tdx/tdx.h | 5 +++ 2 files changed, 69 insertions(+), 1 deletion(-) diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c index 1f9d8108eeea..31ce4522100a 100644 --- a/arch/x86/virt/vmx/tdx/tdx.c +++ b/arch/x86/virt/vmx/tdx/tdx.c @@ -13,6 +13,8 @@ #include #include #include +#include +#include #include #include #include @@ -123,6 +125,61 @@ static int __init tdx_early_detect(void) } early_initcall(tdx_early_detect); +/* + * Data structure to make SEAMCALL on multiple CPUs concurrently. + * @err is set to -EFAULT when SEAMCALL fails on any cpu. + */ +struct seamcall_ctx { + u64 fn; + u64 rcx; + u64 rdx; + u64 r8; + u64 r9; + atomic_t err; +}; + +/* + * Wrapper of __seamcall(). It additionally prints out the error + * informationi if __seamcall() fails normally. It is useful during + * the module initialization by providing more information to the user. + */ +static u64 seamcall(u64 fn, u64 rcx, u64 rdx, u64 r8, u64 r9, + struct tdx_module_output *out) +{ + u64 ret; + + ret = __seamcall(fn, rcx, rdx, r8, r9, out); + if (ret == TDX_SEAMCALL_VMFAILINVALID || !ret) + return ret; + + pr_err("SEAMCALL failed: leaf: 0x%llx, error: 0x%llx\n", fn, ret); + if (out) + pr_err("SEAMCALL additional output: rcx 0x%llx, rdx 0x%llx, r8 0x%llx, r9 0x%llx, r10 0x%llx, r11 0x%llx.\n", + out->rcx, out->rdx, out->r8, out->r9, out->r10, out->r11); + + return ret; +} + +static void seamcall_smp_call_function(void *data) +{ + struct seamcall_ctx *sc = data; + struct tdx_module_output out; + u64 ret; + + ret = seamcall(sc->fn, sc->rcx, sc->rdx, sc->r8, sc->r9, &out); + if (ret) + atomic_set(&sc->err, -EFAULT); +} + +/* + * Call the SEAMCALL on all online CPUs concurrently. Caller to check + * @sc->err to determine whether any SEAMCALL failed on any cpu. + */ +static void seamcall_on_each_cpu(struct seamcall_ctx *sc) +{ + on_each_cpu(seamcall_smp_call_function, sc, true); +} + /* * Detect and initialize the TDX module. * @@ -138,7 +195,10 @@ static int init_tdx_module(void) static void shutdown_tdx_module(void) { - /* TODO: Shut down the TDX module */ + struct seamcall_ctx sc = { .fn = TDH_SYS_LP_SHUTDOWN }; + + seamcall_on_each_cpu(&sc); + tdx_module_status = TDX_MODULE_SHUTDOWN; } @@ -221,6 +281,9 @@ bool platform_tdx_enabled(void) * CPU hotplug is temporarily disabled internally to prevent any cpu * from going offline. * + * Caller also needs to guarantee all CPUs are in VMX operation during + * this function, otherwise Oops may be triggered. + * * This function can be called in parallel by multiple callers. * * Return: diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h index f1a2dfb978b1..95d4eb884134 100644 --- a/arch/x86/virt/vmx/tdx/tdx.h +++ b/arch/x86/virt/vmx/tdx/tdx.h @@ -46,6 +46,11 @@ #define TDX_KEYID_NUM(_keyid_part) ((u32)((_keyid_part) >> 32)) +/* + * TDX module SEAMCALL leaf functions + */ +#define TDH_SYS_LP_SHUTDOWN 44 + /* * Do not put any hardware-defined TDX structure representations below this * comment! From patchwork Wed Jun 22 11:16:32 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Kai" X-Patchwork-Id: 12890539 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E1878C433EF for ; Wed, 22 Jun 2022 11:17:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1357373AbiFVLRh (ORCPT ); Wed, 22 Jun 2022 07:17:37 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57688 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1357396AbiFVLRM (ORCPT ); Wed, 22 Jun 2022 07:17:12 -0400 Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4D5863C736; Wed, 22 Jun 2022 04:17:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1655896628; x=1687432628; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=AQ+lrHitEV8+b6N1UB3Rz8HQXBcp9HIoxbqfRTWJ56w=; b=afAakwIkQSYZvhGheI+E9H/d9pPY/jdMIhrO7Fhf+wFsPxcetVMPStCk aE/BKHB8ptUGXSeSBJxNOWeGrB3CkdtzMc4wuqCP5L2G4/qexJ46UQeXt OOLAzTTrHSFEfT924TYNoRCeNTXkfj/iZ8ktahRFqIqH//zIGPNEIUnXv gnD3jNDRCR+bzFJ0sTqnqfVwS36W/+rFyCu949UA/6if5W+UW2DS0LKFA 8IJwwvPFVGbdxkddCgv6Shr0ZyEltBvpj0UTCkzlIv06PHBXQox5a9UZB XBoInMXLy3ZvcKjDXnPbxQRurlqMScxOXhbrObSy3UGkdd6mmYSLouRo2 A==; X-IronPort-AV: E=McAfee;i="6400,9594,10385"; a="344380030" X-IronPort-AV: E=Sophos;i="5.92,212,1650956400"; d="scan'208";a="344380030" Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Jun 2022 04:17:08 -0700 X-IronPort-AV: E=Sophos;i="5.92,212,1650956400"; d="scan'208";a="834065780" Received: from jmatsis-mobl.amr.corp.intel.com (HELO khuang2-desk.gar.corp.intel.com) ([10.209.178.197]) by fmsmga006-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Jun 2022 04:17:04 -0700 From: Kai Huang To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: seanjc@google.com, pbonzini@redhat.com, dave.hansen@intel.com, len.brown@intel.com, tony.luck@intel.com, rafael.j.wysocki@intel.com, reinette.chatre@intel.com, dan.j.williams@intel.com, peterz@infradead.org, ak@linux.intel.com, kirill.shutemov@linux.intel.com, sathyanarayanan.kuppuswamy@linux.intel.com, isaku.yamahata@intel.com, kai.huang@intel.com Subject: [PATCH v5 09/22] x86/virt/tdx: Detect TDX module by doing module global initialization Date: Wed, 22 Jun 2022 23:16:32 +1200 Message-Id: <168253372035629fda418628af278a1c3044cda6.1655894131.git.kai.huang@intel.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org So far the TDX module hasn't been detected yet. __seamcall() returns TDX_SEAMCALL_VMFAILINVALID when the target SEAM software module is not loaded. Just use __seamcall() to the TDX module to detect the TDX module. The first step of initializing the module is to call TDH.SYS.INIT once on any logical cpu to do module global initialization. Just use it to detect the module since it needs to be done anyway. Signed-off-by: Kai Huang --- - v3 -> v5 (no feedback on v4): - Add detecting TDX module. --- arch/x86/virt/vmx/tdx/tdx.c | 39 +++++++++++++++++++++++++++++++++++-- arch/x86/virt/vmx/tdx/tdx.h | 1 + 2 files changed, 38 insertions(+), 2 deletions(-) diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c index 31ce4522100a..de4efc16ed45 100644 --- a/arch/x86/virt/vmx/tdx/tdx.c +++ b/arch/x86/virt/vmx/tdx/tdx.c @@ -180,6 +180,21 @@ static void seamcall_on_each_cpu(struct seamcall_ctx *sc) on_each_cpu(seamcall_smp_call_function, sc, true); } +/* + * Do TDX module global initialization. It also detects whether the + * module has been loaded or not. + */ +static int tdx_module_init_global(void) +{ + u64 ret; + + ret = seamcall(TDH_SYS_INIT, 0, 0, 0, 0, NULL); + if (ret == TDX_SEAMCALL_VMFAILINVALID) + return -ENODEV; + + return ret ? -EFAULT : 0; +} + /* * Detect and initialize the TDX module. * @@ -189,8 +204,28 @@ static void seamcall_on_each_cpu(struct seamcall_ctx *sc) */ static int init_tdx_module(void) { - /* The TDX module hasn't been detected */ - return -ENODEV; + int ret; + + /* + * Whether the TDX module is loaded is still unknown. SEAMCALL + * instruction fails with VMfailInvalid if the target SEAM + * software module is not loaded, so it can be used to detect the + * module. + * + * The first step of initializing the TDX module is module global + * initialization. Just use it to detect the module. + */ + ret = tdx_module_init_global(); + if (ret) + goto out; + + /* + * Return -EINVAL until all steps of TDX module initialization + * process are done. + */ + ret = -EINVAL; +out: + return ret; } static void shutdown_tdx_module(void) diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h index 95d4eb884134..9e694789eb91 100644 --- a/arch/x86/virt/vmx/tdx/tdx.h +++ b/arch/x86/virt/vmx/tdx/tdx.h @@ -49,6 +49,7 @@ /* * TDX module SEAMCALL leaf functions */ +#define TDH_SYS_INIT 33 #define TDH_SYS_LP_SHUTDOWN 44 /* From patchwork Wed Jun 22 11:16:59 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Kai" X-Patchwork-Id: 12890540 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9C850C433EF for ; Wed, 22 Jun 2022 11:18:09 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1357324AbiFVLSF (ORCPT ); Wed, 22 Jun 2022 07:18:05 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58192 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1357278AbiFVLRc (ORCPT ); Wed, 22 Jun 2022 07:17:32 -0400 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B6F993C4B9; Wed, 22 Jun 2022 04:17:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1655896640; x=1687432640; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=Fm6WKLtKp2WdA8yKi9Sm82sS1NqYszbzPYmVtNEb02k=; b=Vi7Dkcac7NdvqiE64lkkXvNI8YUa/uWy1MsX9PBNPcx32Y7/W9mL98PX ZEy9dABlMRMLvm+MFNCW2by7a8/gzQo47H9fXcySyl56nwnpDgg+QpdSr yaxHgFm3q1L+tkYP7sEH1Hb21Qh8lcFMSpy4lbQoE1kAAEv1WEKEInqp0 OM1Xql84vZptKBJFwwu0uqMCS1vqFQEpP/XRnM1pVXKXAk5bixvMmikVK cJAjade1T6AyY3eHbOWlEdD1b+HOwKqhpRA35crQ8JK9S894KeK7zHwfX AM+UyM5SkugwTMGeO5U/7h4ALjUMah+SbKlqzhdnJY9n+eoVD57NmheOg Q==; X-IronPort-AV: E=McAfee;i="6400,9594,10385"; a="281464711" X-IronPort-AV: E=Sophos;i="5.92,212,1650956400"; d="scan'208";a="281464711" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Jun 2022 04:17:20 -0700 X-IronPort-AV: E=Sophos;i="5.92,212,1650956400"; d="scan'208";a="730302229" Received: from jmatsis-mobl.amr.corp.intel.com (HELO khuang2-desk.gar.corp.intel.com) ([10.209.178.197]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Jun 2022 04:17:16 -0700 From: Kai Huang To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: seanjc@google.com, pbonzini@redhat.com, dave.hansen@intel.com, len.brown@intel.com, tony.luck@intel.com, rafael.j.wysocki@intel.com, reinette.chatre@intel.com, dan.j.williams@intel.com, peterz@infradead.org, ak@linux.intel.com, kirill.shutemov@linux.intel.com, sathyanarayanan.kuppuswamy@linux.intel.com, isaku.yamahata@intel.com, kai.huang@intel.com Subject: [PATCH v5 10/22] x86/virt/tdx: Do logical-cpu scope TDX module initialization Date: Wed, 22 Jun 2022 23:16:59 +1200 Message-Id: <41c84840443d7ba5fa2d23a5b96784d704a32a05.1655894131.git.kai.huang@intel.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org After the global module initialization, the next step is logical-cpu scope module initialization. Logical-cpu initialization requires calling TDH.SYS.LP.INIT on all BIOS-enabled CPUs. This SEAMCALL can run concurrently on all CPUs. Use the helper introduced for shutting down the module to do logical-cpu scope initialization. Signed-off-by: Kai Huang --- arch/x86/virt/vmx/tdx/tdx.c | 15 +++++++++++++++ arch/x86/virt/vmx/tdx/tdx.h | 1 + 2 files changed, 16 insertions(+) diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c index de4efc16ed45..f3f6e20aa30e 100644 --- a/arch/x86/virt/vmx/tdx/tdx.c +++ b/arch/x86/virt/vmx/tdx/tdx.c @@ -195,6 +195,15 @@ static int tdx_module_init_global(void) return ret ? -EFAULT : 0; } +static int tdx_module_init_cpus(void) +{ + struct seamcall_ctx sc = { .fn = TDH_SYS_LP_INIT }; + + seamcall_on_each_cpu(&sc); + + return atomic_read(&sc.err); +} + /* * Detect and initialize the TDX module. * @@ -219,6 +228,12 @@ static int init_tdx_module(void) if (ret) goto out; + /* Logical-cpu scope initialization */ + ret = tdx_module_init_cpus(); + if (ret) + goto out; + + /* * Return -EINVAL until all steps of TDX module initialization * process are done. diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h index 9e694789eb91..56164bf27378 100644 --- a/arch/x86/virt/vmx/tdx/tdx.h +++ b/arch/x86/virt/vmx/tdx/tdx.h @@ -50,6 +50,7 @@ * TDX module SEAMCALL leaf functions */ #define TDH_SYS_INIT 33 +#define TDH_SYS_LP_INIT 35 #define TDH_SYS_LP_SHUTDOWN 44 /* From patchwork Wed Jun 22 11:17:00 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Kai" X-Patchwork-Id: 12890541 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 781D0C433EF for ; Wed, 22 Jun 2022 11:18:11 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1357396AbiFVLSK (ORCPT ); Wed, 22 Jun 2022 07:18:10 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56678 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1357240AbiFVLRe (ORCPT ); Wed, 22 Jun 2022 07:17:34 -0400 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 671F63CA4A; Wed, 22 Jun 2022 04:17:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1655896644; x=1687432644; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=R7yojN39hyqmL+OvLZikI2Tqp65QvZbCf9dpqdDXoeg=; b=d73pPxeabPMUzjQDVpsR6LdVsnk+yOm5lU7XAj9bWL8+SGlqRRYWzgEf Xq/QGYoCbfl9lA+vf4UZMGZtwugWAkPwOFPVMJA0TUR39xdznBMYQFtZp X+rVguMUPBn4ivaM3wqWX9kxV2LqEE/sVpl3qZihEWNKULWRmktjJzSBl OANNHSyoKolI4S7T9KWeaKG8vpkHtc0oeL0XN8ZvswEGJguW82+C+7J1F lWh6mjaVg43EFprvrHzxLWFIA0NnolF8jACrGL0tOIsZYHJ7PK+bd9cS2 6Z3RRs/4zYg2Qmznt3jBBfIRBi6iSoLutL6ouCaQ9ccueZ4ZE5EfhdMc3 Q==; X-IronPort-AV: E=McAfee;i="6400,9594,10385"; a="281464728" X-IronPort-AV: E=Sophos;i="5.92,212,1650956400"; d="scan'208";a="281464728" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Jun 2022 04:17:23 -0700 X-IronPort-AV: E=Sophos;i="5.92,212,1650956400"; d="scan'208";a="730302235" Received: from jmatsis-mobl.amr.corp.intel.com (HELO khuang2-desk.gar.corp.intel.com) ([10.209.178.197]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Jun 2022 04:17:19 -0700 From: Kai Huang To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: seanjc@google.com, pbonzini@redhat.com, dave.hansen@intel.com, len.brown@intel.com, tony.luck@intel.com, rafael.j.wysocki@intel.com, reinette.chatre@intel.com, dan.j.williams@intel.com, peterz@infradead.org, ak@linux.intel.com, kirill.shutemov@linux.intel.com, sathyanarayanan.kuppuswamy@linux.intel.com, isaku.yamahata@intel.com, kai.huang@intel.com Subject: [PATCH v5 11/22] x86/virt/tdx: Get information about TDX module and TDX-capable memory Date: Wed, 22 Jun 2022 23:17:00 +1200 Message-Id: <24bab3c465ccc84b046f032e88bb1c79e6b17bed.1655894131.git.kai.huang@intel.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org TDX provides increased levels of memory confidentiality and integrity. This requires special hardware support for features like memory encryption and storage of memory integrity checksums. Not all memory satisfies these requirements. As a result, TDX introduced the concept of a "Convertible Memory Region" (CMR). During boot, the firmware builds a list of all of the memory ranges which can provide the TDX security guarantees. The list of these ranges, along with TDX module information, is available to the kernel by querying the TDX module via TDH.SYS.INFO SEAMCALL. The host kernel can choose whether or not to use all convertible memory regions as TDX-usable memory. Before the TDX module is ready to create any TDX guests, the kernel needs to configure the TDX-usable memory regions by passing an array of "TD Memory Regions" (TDMRs) to the TDX module. Constructing the TDMR array requires information of both the TDX module (TDSYSINFO_STRUCT) and the Convertible Memory Regions. Call TDH.SYS.INFO to get this information as preparation. Use static variables for both TDSYSINFO_STRUCT and CMR array to avoid having to pass them as function arguments when constructing the TDMR array. And they are too big to be put to the stack anyway. Also, KVM needs the TDSYSINFO_STRUCT to create TDX guests. Signed-off-by: Kai Huang --- - v3 -> v5 (no feedback on v4): - Renamed sanitize_cmrs() to check_cmrs(). - Removed unnecessary sanity check against tdx_sysinfo and tdx_cmr_array actual size returned by TDH.SYS.INFO. - Changed -EFAULT to -EINVAL in couple places. - Added comments around tdx_sysinfo and tdx_cmr_array saying they are used by TDH.SYS.INFO ABI. - Changed to pass 'tdx_sysinfo' and 'tdx_cmr_array' as function arguments in tdx_get_sysinfo(). - Changed to only print BIOS-CMR when check_cmrs() fails. --- arch/x86/virt/vmx/tdx/tdx.c | 137 ++++++++++++++++++++++++++++++++++++ arch/x86/virt/vmx/tdx/tdx.h | 61 ++++++++++++++++ 2 files changed, 198 insertions(+) diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c index f3f6e20aa30e..1bc97756bc0d 100644 --- a/arch/x86/virt/vmx/tdx/tdx.c +++ b/arch/x86/virt/vmx/tdx/tdx.c @@ -45,6 +45,11 @@ static enum tdx_module_status_t tdx_module_status; /* Prevent concurrent attempts on TDX detection and initialization */ static DEFINE_MUTEX(tdx_module_lock); +/* Below two are used in TDH.SYS.INFO SEAMCALL ABI */ +static struct tdsysinfo_struct tdx_sysinfo; +static struct cmr_info tdx_cmr_array[MAX_CMRS] __aligned(CMR_INFO_ARRAY_ALIGNMENT); +static int tdx_cmr_num; + /* Detect whether CPU supports SEAM */ static int detect_seam(void) { @@ -204,6 +209,135 @@ static int tdx_module_init_cpus(void) return atomic_read(&sc.err); } +static inline bool cmr_valid(struct cmr_info *cmr) +{ + return !!cmr->size; +} + +static void print_cmrs(struct cmr_info *cmr_array, int cmr_num, + const char *name) +{ + int i; + + for (i = 0; i < cmr_num; i++) { + struct cmr_info *cmr = &cmr_array[i]; + + pr_info("%s : [0x%llx, 0x%llx)\n", name, + cmr->base, cmr->base + cmr->size); + } +} + +/* + * Check the CMRs reported by TDH.SYS.INFO and update the actual number + * of CMRs. The CMRs returned by the TDH.SYS.INFO may contain invalid + * CMRs after the last valid CMR, but there should be no invalid CMRs + * between two valid CMRs. Check and update the actual number of CMRs + * number by dropping all tail empty CMRs. + */ +static int check_cmrs(struct cmr_info *cmr_array, int *actual_cmr_num) +{ + int cmr_num = *actual_cmr_num; + int i, j; + + /* + * Intel TDX module spec, 20.7.3 CMR_INFO: + * + * TDH.SYS.INFO leaf function returns a MAX_CMRS (32) entry + * array of CMR_INFO entries. The CMRs are sorted from the + * lowest base address to the highest base address, and they + * are non-overlapping. + * + * This implies that BIOS may generate invalid empty entries + * if total CMRs are less than 32. Skip them manually. + */ + for (i = 0; i < cmr_num; i++) { + struct cmr_info *cmr = &cmr_array[i]; + struct cmr_info *prev_cmr = NULL; + + /* Skip further invalid CMRs */ + if (!cmr_valid(cmr)) + break; + + if (i > 0) + prev_cmr = &cmr_array[i - 1]; + + /* + * It is a TDX firmware bug if CMRs are not + * in address ascending order. + */ + if (prev_cmr && ((prev_cmr->base + prev_cmr->size) > + cmr->base)) { + print_cmrs(cmr_array, cmr_num, "BIOS-CMR"); + pr_err("Firmware bug: CMRs not in address ascending order.\n"); + return -EINVAL; + } + } + + /* + * Also a sane BIOS should never generate invalid CMR(s) between + * two valid CMRs. Sanity check this and simply return error in + * this case. + * + * By reaching here @i is the index of the first invalid CMR (or + * cmr_num). Starting with next entry of @i since it has already + * been checked. + */ + for (j = i + 1; j < cmr_num; j++) { + if (cmr_valid(&cmr_array[j])) { + print_cmrs(cmr_array, cmr_num, "BIOS-CMR"); + pr_err("Firmware bug: invalid CMR(s) before valid CMRs.\n"); + return -EINVAL; + } + } + + /* + * Trim all tail invalid empty CMRs. BIOS should generate at + * least one valid CMR, otherwise it's a TDX firmware bug. + */ + if (i == 0) { + print_cmrs(cmr_array, cmr_num, "BIOS-CMR"); + pr_err("Firmware bug: No valid CMR.\n"); + return -EINVAL; + } + + /* Update the actual number of CMRs */ + *actual_cmr_num = i; + + /* Print kernel checked CMRs */ + print_cmrs(cmr_array, *actual_cmr_num, "Kernel-checked-CMR"); + + return 0; +} + +static int tdx_get_sysinfo(struct tdsysinfo_struct *tdsysinfo, + struct cmr_info *cmr_array, + int *actual_cmr_num) +{ + struct tdx_module_output out; + u64 ret; + + BUILD_BUG_ON(sizeof(struct tdsysinfo_struct) != TDSYSINFO_STRUCT_SIZE); + + ret = seamcall(TDH_SYS_INFO, __pa(tdsysinfo), TDSYSINFO_STRUCT_SIZE, + __pa(cmr_array), MAX_CMRS, &out); + if (ret) + return -EFAULT; + + /* R9 contains the actual entries written the CMR array. */ + *actual_cmr_num = out.r9; + + pr_info("TDX module: vendor_id 0x%x, major_version %u, minor_version %u, build_date %u, build_num %u", + tdsysinfo->vendor_id, tdsysinfo->major_version, + tdsysinfo->minor_version, tdsysinfo->build_date, + tdsysinfo->build_num); + + /* + * check_cmrs() updates the actual number of CMRs by dropping all + * tail invalid CMRs. + */ + return check_cmrs(cmr_array, actual_cmr_num); +} + /* * Detect and initialize the TDX module. * @@ -233,6 +367,9 @@ static int init_tdx_module(void) if (ret) goto out; + ret = tdx_get_sysinfo(&tdx_sysinfo, tdx_cmr_array, &tdx_cmr_num); + if (ret) + goto out; /* * Return -EINVAL until all steps of TDX module initialization diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h index 56164bf27378..63b1edd11660 100644 --- a/arch/x86/virt/vmx/tdx/tdx.h +++ b/arch/x86/virt/vmx/tdx/tdx.h @@ -49,10 +49,71 @@ /* * TDX module SEAMCALL leaf functions */ +#define TDH_SYS_INFO 32 #define TDH_SYS_INIT 33 #define TDH_SYS_LP_INIT 35 #define TDH_SYS_LP_SHUTDOWN 44 +struct cmr_info { + u64 base; + u64 size; +} __packed; + +#define MAX_CMRS 32 +#define CMR_INFO_ARRAY_ALIGNMENT 512 + +struct cpuid_config { + u32 leaf; + u32 sub_leaf; + u32 eax; + u32 ebx; + u32 ecx; + u32 edx; +} __packed; + +#define TDSYSINFO_STRUCT_SIZE 1024 +#define TDSYSINFO_STRUCT_ALIGNMENT 1024 + +struct tdsysinfo_struct { + /* TDX-SEAM Module Info */ + u32 attributes; + u32 vendor_id; + u32 build_date; + u16 build_num; + u16 minor_version; + u16 major_version; + u8 reserved0[14]; + /* Memory Info */ + u16 max_tdmrs; + u16 max_reserved_per_tdmr; + u16 pamt_entry_size; + u8 reserved1[10]; + /* Control Struct Info */ + u16 tdcs_base_size; + u8 reserved2[2]; + u16 tdvps_base_size; + u8 tdvps_xfam_dependent_size; + u8 reserved3[9]; + /* TD Capabilities */ + u64 attributes_fixed0; + u64 attributes_fixed1; + u64 xfam_fixed0; + u64 xfam_fixed1; + u8 reserved4[32]; + u32 num_cpuid_config; + /* + * The actual number of CPUID_CONFIG depends on above + * 'num_cpuid_config'. The size of 'struct tdsysinfo_struct' + * is 1024B defined by TDX architecture. Use a union with + * specific padding to make 'sizeof(struct tdsysinfo_struct)' + * equal to 1024. + */ + union { + struct cpuid_config cpuid_configs[0]; + u8 reserved5[892]; + }; +} __packed __aligned(TDSYSINFO_STRUCT_ALIGNMENT); + /* * Do not put any hardware-defined TDX structure representations below this * comment! From patchwork Wed Jun 22 11:17:01 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Kai" X-Patchwork-Id: 12890542 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 716B7C43334 for ; Wed, 22 Jun 2022 11:18:14 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1357458AbiFVLSN (ORCPT ); Wed, 22 Jun 2022 07:18:13 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57420 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1357448AbiFVLR6 (ORCPT ); Wed, 22 Jun 2022 07:17:58 -0400 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7A6273CA44; Wed, 22 Jun 2022 04:17:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1655896651; x=1687432651; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=4wAXVlXrRybVMRZIe1PorOVh4HwvIaxV8WnHdKrV4cg=; b=kYjZFCgDc1A5dCdQwano/pfUnfBIngD4QEok5cp8hfV3T5T5ExNI459Z EbruSApseoFhEJvWuzZuYfA2Uz6rXgPRwpFFmIUGavfWqBiCwd7mVQoiU tBWTNZwYRDQ1uV+lSKUKUqIQvtKOKEPbk1d+YtS+zhbLQ+ez0WOd22ju9 2DBQYZd0DI+SMJ8HxnfQ8nOUOwwx5kPFTonpMDu8AWugjJmwKSzsud8Up 6X5G5LRA3J6H8GMZM1RyVF8uuGcwMEUoaDukSl4L0jGFXdclii9Dh5bkT 5ktEQOhTpVwohxMd7zrHmdA/JQXyJZslkfoYMo6Im/mEEoMEV7q6qwlYB Q==; X-IronPort-AV: E=McAfee;i="6400,9594,10385"; a="281464740" X-IronPort-AV: E=Sophos;i="5.92,212,1650956400"; d="scan'208";a="281464740" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Jun 2022 04:17:26 -0700 X-IronPort-AV: E=Sophos;i="5.92,212,1650956400"; d="scan'208";a="730302245" Received: from jmatsis-mobl.amr.corp.intel.com (HELO khuang2-desk.gar.corp.intel.com) ([10.209.178.197]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Jun 2022 04:17:23 -0700 From: Kai Huang To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: seanjc@google.com, pbonzini@redhat.com, dave.hansen@intel.com, len.brown@intel.com, tony.luck@intel.com, rafael.j.wysocki@intel.com, reinette.chatre@intel.com, dan.j.williams@intel.com, peterz@infradead.org, ak@linux.intel.com, kirill.shutemov@linux.intel.com, sathyanarayanan.kuppuswamy@linux.intel.com, isaku.yamahata@intel.com, kai.huang@intel.com Subject: [PATCH v5 12/22] x86/virt/tdx: Convert all memory regions in memblock to TDX memory Date: Wed, 22 Jun 2022 23:17:01 +1200 Message-Id: <8288396be7fedd10521a28531e138579594d757a.1655894131.git.kai.huang@intel.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org The TDX module reports a list of Convertible Memory Regions (CMR) to identify which memory regions can be used as TDX memory, but they are not automatically usable to the TDX module. The kernel needs to choose which convertible memory regions to be TDX memory and configure those regions by passing an array of "TD Memory Regions" (TDMR) to the TDX module. To avoid having to modify the page allocator to distinguish TDX and non-TDX memory allocation, convert all memory regions in the memblock to TDX memory. As the first step, sanity check all memory regions in memblock are fully covered by CMRs so the above conversion is guaranteed to work. This works also because both ACPI memory hotplug (reported as BIOS bug) and driver managed memory hotplug are both prevented when TDX is enabled by BIOS, so no new non-TDX-convertible memory can end up to the page allocator. Select ARCH_KEEP_MEMBLOCK when CONFIG_INTEL_TDX_HOST to keep memblock after boot so it can be used during the TDX module initialization. Also, explicitly exclude memory regions below first 1MB as TDX memory because those regions may not be reported as convertible memory. This is OK as the first 1MB is always reserved during kernel boot and won't end up to the page allocator. Signed-off-by: Kai Huang --- - v3 -> v4 (no feedback on v4): - Changed to use memblock from e820. - Simplified changelog a lot. --- arch/x86/Kconfig | 1 + arch/x86/virt/vmx/tdx/tdx.c | 100 ++++++++++++++++++++++++++++++++++++ 2 files changed, 101 insertions(+) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index efa830853e98..4988a91d5283 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -1974,6 +1974,7 @@ config INTEL_TDX_HOST depends on X86_64 depends on KVM_INTEL select ARCH_HAS_CC_PLATFORM + select ARCH_KEEP_MEMBLOCK help Intel Trust Domain Extensions (TDX) protects guest VMs from malicious host and certain physical attacks. This option enables necessary TDX diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c index 1bc97756bc0d..2b20d4a7a62b 100644 --- a/arch/x86/virt/vmx/tdx/tdx.c +++ b/arch/x86/virt/vmx/tdx/tdx.c @@ -15,6 +15,8 @@ #include #include #include +#include +#include #include #include #include @@ -338,6 +340,91 @@ static int tdx_get_sysinfo(struct tdsysinfo_struct *tdsysinfo, return check_cmrs(cmr_array, actual_cmr_num); } +/* + * Skip the memory region below 1MB. Return true if the entire + * region is skipped. Otherwise, the updated range is returned. + */ +static bool pfn_range_skip_lowmem(unsigned long *p_start_pfn, + unsigned long *p_end_pfn) +{ + u64 start, end; + + start = *p_start_pfn << PAGE_SHIFT; + end = *p_end_pfn << PAGE_SHIFT; + + if (start < SZ_1M) + start = SZ_1M; + + if (start >= end) + return true; + + *p_start_pfn = (start >> PAGE_SHIFT); + + return false; +} + +/* + * Walks over all memblock memory regions that are intended to be + * converted to TDX memory. Essentially, it is all memblock memory + * regions excluding the low memory below 1MB. + * + * This is because on some TDX platforms the low memory below 1MB is + * not included in CMRs. Excluding the low 1MB can still guarantee + * that the pages managed by the page allocator are always TDX memory, + * as the low 1MB is reserved during kernel boot and won't end up to + * the ZONE_DMA (see reserve_real_mode()). + */ +#define memblock_for_each_tdx_mem_pfn_range(i, p_start, p_end, p_nid) \ + for_each_mem_pfn_range(i, MAX_NUMNODES, p_start, p_end, p_nid) \ + if (!pfn_range_skip_lowmem(p_start, p_end)) + +/* Check whether first range is the subrange of the second */ +static bool is_subrange(u64 r1_start, u64 r1_end, u64 r2_start, u64 r2_end) +{ + return r1_start >= r2_start && r1_end <= r2_end; +} + +/* Check whether address range is covered by any CMR or not. */ +static bool range_covered_by_cmr(struct cmr_info *cmr_array, int cmr_num, + u64 start, u64 end) +{ + int i; + + for (i = 0; i < cmr_num; i++) { + struct cmr_info *cmr = &cmr_array[i]; + + if (is_subrange(start, end, cmr->base, cmr->base + cmr->size)) + return true; + } + + return false; +} + +/* + * Check whether all memory regions in memblock are TDX convertible + * memory. Return 0 if all memory regions are convertible, or error. + */ +static int check_memblock_tdx_convertible(void) +{ + unsigned long start_pfn, end_pfn; + int i; + + memblock_for_each_tdx_mem_pfn_range(i, &start_pfn, &end_pfn, NULL) { + u64 start, end; + + start = start_pfn << PAGE_SHIFT; + end = end_pfn << PAGE_SHIFT; + if (!range_covered_by_cmr(tdx_cmr_array, tdx_cmr_num, start, + end)) { + pr_err("[0x%llx, 0x%llx) is not fully convertible memory\n", + start, end); + return -EINVAL; + } + } + + return 0; +} + /* * Detect and initialize the TDX module. * @@ -371,6 +458,19 @@ static int init_tdx_module(void) if (ret) goto out; + /* + * To avoid having to modify the page allocator to distinguish + * TDX and non-TDX memory allocation, convert all memory regions + * in memblock to TDX memory to make sure all pages managed by + * the page allocator are TDX memory. + * + * Sanity check all memory regions are fully covered by CMRs to + * make sure they are truly convertible. + */ + ret = check_memblock_tdx_convertible(); + if (ret) + goto out; + /* * Return -EINVAL until all steps of TDX module initialization * process are done. From patchwork Wed Jun 22 11:17:02 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Kai" X-Patchwork-Id: 12890543 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 874F3C43334 for ; Wed, 22 Jun 2022 11:18:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1357168AbiFVLSf (ORCPT ); Wed, 22 Jun 2022 07:18:35 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56654 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1357100AbiFVLSD (ORCPT ); Wed, 22 Jun 2022 07:18:03 -0400 Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F2DA43CFD0; Wed, 22 Jun 2022 04:17:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1655896654; x=1687432654; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=BAEQTxUOrf8ypGAob3v0GI3Jlfen8aKkWY6XM2y0KF4=; b=lXvVsXE9Qgjkt5GOf7IS7G5KQbzSbhMzjvwxVDsc6t7bNqeqLDsXquqg NidlroIoepNour62ofWIPf/UgtsgwFRbM2JX1IaqLIEKJFKi1hmmYBFdQ DljajjqXUYKfmMNJwL/mmhE4pWoCXBJmZ5tF5dIh6dULw+MnUO19tHwiU oPi5AaGoJrmj6GuSJL8/5F0DRI2U05sY+pyAj9UCl+HlSZDWleYXELSxd HC20unLFOgGBubSsO6sou3nHZ33POYXnBJuEarVA9Aku2esusTa4/22l/ UY9Zd/PBz2NplSKYPFgweXDW9CdIUAWd29rQlmJcg8o+mB9Ae/VZZsuQZ A==; X-IronPort-AV: E=McAfee;i="6400,9594,10385"; a="305841077" X-IronPort-AV: E=Sophos;i="5.92,212,1650956400"; d="scan'208";a="305841077" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Jun 2022 04:17:29 -0700 X-IronPort-AV: E=Sophos;i="5.92,212,1650956400"; d="scan'208";a="730302266" Received: from jmatsis-mobl.amr.corp.intel.com (HELO khuang2-desk.gar.corp.intel.com) ([10.209.178.197]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Jun 2022 04:17:26 -0700 From: Kai Huang To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: seanjc@google.com, pbonzini@redhat.com, dave.hansen@intel.com, len.brown@intel.com, tony.luck@intel.com, rafael.j.wysocki@intel.com, reinette.chatre@intel.com, dan.j.williams@intel.com, peterz@infradead.org, ak@linux.intel.com, kirill.shutemov@linux.intel.com, sathyanarayanan.kuppuswamy@linux.intel.com, isaku.yamahata@intel.com, kai.huang@intel.com Subject: [PATCH v5 13/22] x86/virt/tdx: Add placeholder to construct TDMRs based on memblock Date: Wed, 22 Jun 2022 23:17:02 +1200 Message-Id: <3f2fd2a4d09c146184fb45ed56326420b8097474.1655894131.git.kai.huang@intel.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org TDX provides increased levels of memory confidentiality and integrity. This requires special hardware support for features like memory encryption and storage of memory integrity checksums. Not all memory satisfies these requirements. As a result, the TDX introduced the concept of a "Convertible Memory Region" (CMR). During boot, the firmware builds a list of all of the memory ranges which can provide the TDX security guarantees. The list of these ranges is available to the kernel by querying the TDX module. The TDX architecture needs additional metadata to record things like which TD guest "owns" a given page of memory. This metadata essentially serves as the 'struct page' for the TDX module. The space for this metadata is not reserved by the hardware up front and must be allocated by the kernel and given to the TDX module. Since this metadata consumes space, the VMM can choose whether or not to allocate it for a given area of convertible memory. If it chooses not to, the memory cannot receive TDX protections and can not be used by TDX guests as private memory. For every memory region that the VMM wants to use as TDX memory, it sets up a "TD Memory Region" (TDMR). Each TDMR represents a physically contiguous convertible range and must also have its own physically contiguous metadata table, referred to as a Physical Address Metadata Table (PAMT), to track status for each page in the TDMR range. Unlike a CMR, each TDMR requires 1G granularity and alignment. To support physical RAM areas that don't meet those strict requirements, each TDMR permits a number of internal "reserved areas" which can be placed over memory holes. If PAMT metadata is placed within a TDMR it must be covered by one of these reserved areas. Let's summarize the concepts: CMR - Firmware-enumerated physical ranges that support TDX. CMRs are 4K aligned. TDMR - Physical address range which is chosen by the kernel to support TDX. 1G granularity and alignment required. Each TDMR has reserved areas where TDX memory holes and overlapping PAMTs can be put into. PAMT - Physically contiguous TDX metadata. One table for each page size per TDMR. Roughly 1/256th of TDMR in size. 256G TDMR = ~1G PAMT. As one step of initializing the TDX module, the kernel configures TDX-usable memory by passing an array of TDMRs to the TDX module. Constructing the array of TDMRs consists below steps: 1) Create TDMRs to cover all memory regions that TDX module can use; 2) Allocate and set up PAMT for each TDMR; 3) Set up reserved areas for each TDMR. Add a placeholder to construct TDMRs to do the above steps after all memblock memory regions are verified to be convertible. Always free TDMRs at the end of the initialization (no matter successful or not) as TDMRs are only used during the initialization. Signed-off-by: Kai Huang --- - v3 -> v5 (no feedback on v4): - Moved calculating TDMR size to this patch. - Changed to use alloc_pages_exact() to allocate buffer for all TDMRs once, instead of allocating each TDMR individually. - Removed "crypto protection" in the changelog. - -EFAULT -> -EINVAL in couple of places. --- arch/x86/virt/vmx/tdx/tdx.c | 73 +++++++++++++++++++++++++++++++++++++ arch/x86/virt/vmx/tdx/tdx.h | 23 ++++++++++++ 2 files changed, 96 insertions(+) diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c index 2b20d4a7a62b..645addb1bea2 100644 --- a/arch/x86/virt/vmx/tdx/tdx.c +++ b/arch/x86/virt/vmx/tdx/tdx.c @@ -17,6 +17,8 @@ #include #include #include +#include +#include #include #include #include @@ -425,6 +427,55 @@ static int check_memblock_tdx_convertible(void) return 0; } +/* Calculate the actual TDMR_INFO size */ +static inline int cal_tdmr_size(void) +{ + int tdmr_sz; + + /* + * The actual size of TDMR_INFO depends on the maximum number + * of reserved areas. + */ + tdmr_sz = sizeof(struct tdmr_info); + tdmr_sz += sizeof(struct tdmr_reserved_area) * + tdx_sysinfo.max_reserved_per_tdmr; + + /* + * TDX requires each TDMR_INFO to be 512-byte aligned. Always + * round up TDMR_INFO size to the 512-byte boundary. + */ + return ALIGN(tdmr_sz, TDMR_INFO_ALIGNMENT); +} + +static struct tdmr_info *alloc_tdmr_array(int *array_sz) +{ + /* + * TDX requires each TDMR_INFO to be 512-byte aligned. + * Use alloc_pages_exact() to allocate all TDMRs at once. + * Each TDMR_INFO will still be 512-byte aligned since + * cal_tdmr_size() always return 512-byte aligned size. + */ + *array_sz = cal_tdmr_size() * tdx_sysinfo.max_tdmrs; + + /* + * Zero the buffer so 'struct tdmr_info::size' can be + * used to determine whether a TDMR is valid. + */ + return alloc_pages_exact(*array_sz, GFP_KERNEL | __GFP_ZERO); +} + +/* + * Construct an array of TDMRs to cover all memory regions in memblock. + * This makes sure all pages managed by the page allocator are TDX + * memory. The actual number of TDMRs is kept to @tdmr_num. + */ +static int construct_tdmrs_memeblock(struct tdmr_info *tdmr_array, + int *tdmr_num) +{ + /* Return -EINVAL until constructing TDMRs is done */ + return -EINVAL; +} + /* * Detect and initialize the TDX module. * @@ -434,6 +485,9 @@ static int check_memblock_tdx_convertible(void) */ static int init_tdx_module(void) { + struct tdmr_info *tdmr_array; + int tdmr_array_sz; + int tdmr_num; int ret; /* @@ -471,11 +525,30 @@ static int init_tdx_module(void) if (ret) goto out; + /* Prepare enough space to construct TDMRs */ + tdmr_array = alloc_tdmr_array(&tdmr_array_sz); + if (!tdmr_array) { + ret = -ENOMEM; + goto out; + } + + /* Construct TDMRs to cover all memory regions in memblock */ + ret = construct_tdmrs_memeblock(tdmr_array, &tdmr_num); + if (ret) + goto out_free_tdmrs; + /* * Return -EINVAL until all steps of TDX module initialization * process are done. */ ret = -EINVAL; +out_free_tdmrs: + /* + * The array of TDMRs is freed no matter the initialization is + * successful or not. They are not needed anymore after the + * module initialization. + */ + free_pages_exact(tdmr_array, tdmr_array_sz); out: return ret; } diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h index 63b1edd11660..55d6c69ab900 100644 --- a/arch/x86/virt/vmx/tdx/tdx.h +++ b/arch/x86/virt/vmx/tdx/tdx.h @@ -114,6 +114,29 @@ struct tdsysinfo_struct { }; } __packed __aligned(TDSYSINFO_STRUCT_ALIGNMENT); +struct tdmr_reserved_area { + u64 offset; + u64 size; +} __packed; + +#define TDMR_INFO_ALIGNMENT 512 + +struct tdmr_info { + u64 base; + u64 size; + u64 pamt_1g_base; + u64 pamt_1g_size; + u64 pamt_2m_base; + u64 pamt_2m_size; + u64 pamt_4k_base; + u64 pamt_4k_size; + /* + * Actual number of reserved areas depends on + * 'struct tdsysinfo_struct'::max_reserved_per_tdmr. + */ + struct tdmr_reserved_area reserved_areas[0]; +} __packed __aligned(TDMR_INFO_ALIGNMENT); + /* * Do not put any hardware-defined TDX structure representations below this * comment! From patchwork Wed Jun 22 11:17:03 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Kai" X-Patchwork-Id: 12890548 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id F05FDC43334 for ; Wed, 22 Jun 2022 11:18:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1357580AbiFVLS5 (ORCPT ); Wed, 22 Jun 2022 07:18:57 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57688 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1357511AbiFVLSH (ORCPT ); Wed, 22 Jun 2022 07:18:07 -0400 Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EC1BA3CFE0; Wed, 22 Jun 2022 04:17:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1655896656; x=1687432656; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=bCtsiI9LBPd6oyXmHEYI1qzHkrFxdxp9TyBs4gud8x0=; b=jBG7SdZSB1kxiX3VkzFU8UgHhUIc8covOZ58tIlA+4wnPco3ae2xck+a VQhnrh7Ra2N7nSo4Vnhpd6QTsFPCJKS9VQoIoCa4etrPDW8ukhBgWlDus MijvEQxsu3GwPWH0x5UFiBKcLe7D2sDCqivkI+tfXgpXXk7kQwCp1jQSt art0GbsYbk76AsGDS4ksKmWhyEI9m3rLp7cyXSN5JXnqkJKWwJh+Ld2SI EVnT70vUqIOgbhKWtlHQxyQuFa8etFlqukwO1ufJ9G6ilj9x1znTa8iNI 9WbGXmjs9mmapA3VQBalvmRrLvoDLn46jq6rfaJyFAN95glFsCp4o+pfY Q==; X-IronPort-AV: E=McAfee;i="6400,9594,10385"; a="305841085" X-IronPort-AV: E=Sophos;i="5.92,212,1650956400"; d="scan'208";a="305841085" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Jun 2022 04:17:33 -0700 X-IronPort-AV: E=Sophos;i="5.92,212,1650956400"; d="scan'208";a="730302272" Received: from jmatsis-mobl.amr.corp.intel.com (HELO khuang2-desk.gar.corp.intel.com) ([10.209.178.197]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Jun 2022 04:17:30 -0700 From: Kai Huang To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: seanjc@google.com, pbonzini@redhat.com, dave.hansen@intel.com, len.brown@intel.com, tony.luck@intel.com, rafael.j.wysocki@intel.com, reinette.chatre@intel.com, dan.j.williams@intel.com, peterz@infradead.org, ak@linux.intel.com, kirill.shutemov@linux.intel.com, sathyanarayanan.kuppuswamy@linux.intel.com, isaku.yamahata@intel.com, kai.huang@intel.com Subject: [PATCH v5 14/22] x86/virt/tdx: Create TDMRs to cover all memblock memory regions Date: Wed, 22 Jun 2022 23:17:03 +1200 Message-Id: X-Mailer: git-send-email 2.36.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org The kernel configures TDX-usable memory regions by passing an array of "TD Memory Regions" (TDMRs) to the TDX module. Each TDMR contains the information of the base/size of a memory region, the base/size of the associated Physical Address Metadata Table (PAMT) and a list of reserved areas in the region. Create a number of TDMRs according to the memblock memory regions. To keep it simple, always try to create one TDMR for each memory region. As the first step only set up the base/size for each TDMR. Each TDMR must be 1G aligned and the size must be in 1G granularity. This implies that one TDMR could cover multiple memory regions. If a memory region spans the 1GB boundary and the former part is already covered by the previous TDMR, just create a new TDMR for the remaining part. TDX only supports a limited number of TDMRs. Disable TDX if all TDMRs are consumed but there is more memory region to cover. Signed-off-by: Kai Huang --- - v3 -> v5 (no feedback on v4): - Removed allocating TDMR individually. - Improved changelog by using Dave's words. - Made TDMR_START() and TDMR_END() as static inline function. --- arch/x86/virt/vmx/tdx/tdx.c | 104 +++++++++++++++++++++++++++++++++++- 1 file changed, 103 insertions(+), 1 deletion(-) diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c index 645addb1bea2..fd9f449b5395 100644 --- a/arch/x86/virt/vmx/tdx/tdx.c +++ b/arch/x86/virt/vmx/tdx/tdx.c @@ -427,6 +427,24 @@ static int check_memblock_tdx_convertible(void) return 0; } +/* TDMR must be 1gb aligned */ +#define TDMR_ALIGNMENT BIT_ULL(30) +#define TDMR_PFN_ALIGNMENT (TDMR_ALIGNMENT >> PAGE_SHIFT) + +/* Align up and down the address to TDMR boundary */ +#define TDMR_ALIGN_DOWN(_addr) ALIGN_DOWN((_addr), TDMR_ALIGNMENT) +#define TDMR_ALIGN_UP(_addr) ALIGN((_addr), TDMR_ALIGNMENT) + +static inline u64 tdmr_start(struct tdmr_info *tdmr) +{ + return tdmr->base; +} + +static inline u64 tdmr_end(struct tdmr_info *tdmr) +{ + return tdmr->base + tdmr->size; +} + /* Calculate the actual TDMR_INFO size */ static inline int cal_tdmr_size(void) { @@ -464,6 +482,82 @@ static struct tdmr_info *alloc_tdmr_array(int *array_sz) return alloc_pages_exact(*array_sz, GFP_KERNEL | __GFP_ZERO); } +static struct tdmr_info *tdmr_array_entry(struct tdmr_info *tdmr_array, + int idx) +{ + return (struct tdmr_info *)((unsigned long)tdmr_array + + cal_tdmr_size() * idx); +} + +/* + * Create TDMRs to cover all memory regions in memblock. The actual + * number of TDMRs is set to @tdmr_num. + */ +static int create_tdmrs(struct tdmr_info *tdmr_array, int *tdmr_num) +{ + unsigned long start_pfn, end_pfn; + int i, nid, tdmr_idx = 0; + + /* + * Loop over all memory regions in memblock and create TDMRs to + * cover them. To keep it simple, always try to use one TDMR to + * cover memory region. + */ + memblock_for_each_tdx_mem_pfn_range(i, &start_pfn, &end_pfn, &nid) { + struct tdmr_info *tdmr; + u64 start, end; + + tdmr = tdmr_array_entry(tdmr_array, tdmr_idx); + start = TDMR_ALIGN_DOWN(start_pfn << PAGE_SHIFT); + end = TDMR_ALIGN_UP(end_pfn << PAGE_SHIFT); + + /* + * If the current TDMR's size hasn't been initialized, + * it is a new TDMR to cover the new memory region. + * Otherwise, the current TDMR has already covered the + * previous memory region. In the latter case, check + * whether the current memory region has been fully or + * partially covered by the current TDMR, since TDMR is + * 1G aligned. + */ + if (tdmr->size) { + /* + * Loop to the next memory region if the current + * region has already fully covered by the + * current TDMR. + */ + if (end <= tdmr_end(tdmr)) + continue; + + /* + * If part of the current memory region has + * already been covered by the current TDMR, + * skip the already covered part. + */ + if (start < tdmr_end(tdmr)) + start = tdmr_end(tdmr); + + /* + * Create a new TDMR to cover the current memory + * region, or the remaining part of it. + */ + tdmr_idx++; + if (tdmr_idx >= tdx_sysinfo.max_tdmrs) + return -E2BIG; + + tdmr = tdmr_array_entry(tdmr_array, tdmr_idx); + } + + tdmr->base = start; + tdmr->size = end - start; + } + + /* @tdmr_idx is always the index of last valid TDMR. */ + *tdmr_num = tdmr_idx + 1; + + return 0; +} + /* * Construct an array of TDMRs to cover all memory regions in memblock. * This makes sure all pages managed by the page allocator are TDX @@ -472,8 +566,16 @@ static struct tdmr_info *alloc_tdmr_array(int *array_sz) static int construct_tdmrs_memeblock(struct tdmr_info *tdmr_array, int *tdmr_num) { + int ret; + + ret = create_tdmrs(tdmr_array, tdmr_num); + if (ret) + goto err; + /* Return -EINVAL until constructing TDMRs is done */ - return -EINVAL; + ret = -EINVAL; +err: + return ret; } /* From patchwork Wed Jun 22 11:17:04 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Kai" X-Patchwork-Id: 12890549 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A6D52C43334 for ; Wed, 22 Jun 2022 11:19:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1357590AbiFVLTS (ORCPT ); Wed, 22 Jun 2022 07:19:18 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57804 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1357438AbiFVLTB (ORCPT ); Wed, 22 Jun 2022 07:19:01 -0400 Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 141BBBCA; Wed, 22 Jun 2022 04:17:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1655896672; x=1687432672; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=yRtHLb3R36woCrpA1QVLqu3p65ADPQV4gzJG23hwXEg=; b=W+uNHGn7n8po5Pm6bmoy22SHyIfZdjD4mbXRB+/4i/uga88OTr7RS9lU OUyNrPAoUWfuNrmZwkdaDhZ1vLdOQ/zWQacjXbLaaC++2sOuw7yOPl/U+ gOawloiVeRmTimaWfeKvF7QklMmHYi3xkDLc61PtsNVYBC3KGesPecvO6 hLL+9TTkH2iDxdICKUHCcMmPFDCgt7CaPKPoPPAM+XC8qcS7uZr22DE19 unrHE9I6qn6SYGzua6SxZawuVyotItNJQ4ZMrCSHSFLztppCFFCZVoWYZ n9a/BoDu9sPUXXO1NjES+E1mKaoJmUxco6pZ65dSUJl9aJDE4c1WFZpOt A==; X-IronPort-AV: E=McAfee;i="6400,9594,10385"; a="305841108" X-IronPort-AV: E=Sophos;i="5.92,212,1650956400"; d="scan'208";a="305841108" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Jun 2022 04:17:36 -0700 X-IronPort-AV: E=Sophos;i="5.92,212,1650956400"; d="scan'208";a="730302277" Received: from jmatsis-mobl.amr.corp.intel.com (HELO khuang2-desk.gar.corp.intel.com) ([10.209.178.197]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Jun 2022 04:17:33 -0700 From: Kai Huang To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: seanjc@google.com, pbonzini@redhat.com, dave.hansen@intel.com, len.brown@intel.com, tony.luck@intel.com, rafael.j.wysocki@intel.com, reinette.chatre@intel.com, dan.j.williams@intel.com, peterz@infradead.org, ak@linux.intel.com, kirill.shutemov@linux.intel.com, sathyanarayanan.kuppuswamy@linux.intel.com, isaku.yamahata@intel.com, kai.huang@intel.com Subject: [PATCH v5 15/22] x86/virt/tdx: Allocate and set up PAMTs for TDMRs Date: Wed, 22 Jun 2022 23:17:04 +1200 Message-Id: X-Mailer: git-send-email 2.36.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org The TDX module uses additional metadata to record things like which guest "owns" a given page of memory. This metadata, referred as Physical Address Metadata Table (PAMT), essentially serves as the 'struct page' for the TDX module. PAMTs are not reserved by hardware up front. They must be allocated by the kernel and then given to the TDX module. TDX supports 3 page sizes: 4K, 2M, and 1G. Each "TD Memory Region" (TDMR) has 3 PAMTs to track the 3 supported page sizes. Each PAMT must be a physically contiguous area from a Convertible Memory Region (CMR). However, the PAMTs which track pages in one TDMR do not need to reside within that TDMR but can be anywhere in CMRs. If one PAMT overlaps with any TDMR, the overlapping part must be reported as a reserved area in that particular TDMR. Use alloc_contig_pages() since PAMT must be a physically contiguous area and it may be potentially large (~1/256th of the size of the given TDMR). The downside is alloc_contig_pages() may fail at runtime. One (bad) mitigation is to launch a TD guest early during system boot to get those PAMTs allocated at early time, but the only way to fix is to add a boot option to allocate or reserve PAMTs during kernel boot. TDX only supports a limited number of reserved areas per TDMR to cover both PAMTs and memory holes within the given TDMR. If many PAMTs are allocated within a single TDMR, the reserved areas may not be sufficient to cover all of them. Adopt the following policies when allocating PAMTs for a given TDMR: - Allocate three PAMTs of the TDMR in one contiguous chunk to minimize the total number of reserved areas consumed for PAMTs. - Try to first allocate PAMT from the local node of the TDMR for better NUMA locality. Also dump out how many pages are allocated for PAMTs when the TDX module is initialized successfully. Signed-off-by: Kai Huang --- - v3 -> v5 (no feedback on v4): - Used memblock to get the NUMA node for given TDMR. - Removed tdmr_get_pamt_sz() helper but use open-code instead. - Changed to use 'switch .. case..' for each TDX supported page size in tdmr_get_pamt_sz() (the original __tdmr_get_pamt_sz()). - Added printing out memory used for PAMT allocation when TDX module is initialized successfully. - Explained downside of alloc_contig_pages() in changelog. - Addressed other minor comments. --- arch/x86/Kconfig | 1 + arch/x86/virt/vmx/tdx/tdx.c | 200 ++++++++++++++++++++++++++++++++++++ 2 files changed, 201 insertions(+) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 4988a91d5283..ec496e96d120 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -1973,6 +1973,7 @@ config INTEL_TDX_HOST depends on CPU_SUP_INTEL depends on X86_64 depends on KVM_INTEL + depends on CONTIG_ALLOC select ARCH_HAS_CC_PLATFORM select ARCH_KEEP_MEMBLOCK help diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c index fd9f449b5395..36260dd7e69f 100644 --- a/arch/x86/virt/vmx/tdx/tdx.c +++ b/arch/x86/virt/vmx/tdx/tdx.c @@ -558,6 +558,196 @@ static int create_tdmrs(struct tdmr_info *tdmr_array, int *tdmr_num) return 0; } +/* Page sizes supported by TDX */ +enum tdx_page_sz { + TDX_PG_4K, + TDX_PG_2M, + TDX_PG_1G, + TDX_PG_MAX, +}; + +/* + * Calculate PAMT size given a TDMR and a page size. The returned + * PAMT size is always aligned up to 4K page boundary. + */ +static unsigned long tdmr_get_pamt_sz(struct tdmr_info *tdmr, + enum tdx_page_sz pgsz) +{ + unsigned long pamt_sz; + int pamt_entry_nr; + + switch (pgsz) { + case TDX_PG_4K: + pamt_entry_nr = tdmr->size >> PAGE_SHIFT; + break; + case TDX_PG_2M: + pamt_entry_nr = tdmr->size >> PMD_SHIFT; + break; + case TDX_PG_1G: + pamt_entry_nr = tdmr->size >> PUD_SHIFT; + break; + default: + WARN_ON_ONCE(1); + return 0; + } + + pamt_sz = pamt_entry_nr * tdx_sysinfo.pamt_entry_size; + /* TDX requires PAMT size must be 4K aligned */ + pamt_sz = ALIGN(pamt_sz, PAGE_SIZE); + + return pamt_sz; +} + +/* + * Pick a NUMA node on which to allocate this TDMR's metadata. + * + * This is imprecise since TDMRs are 1G aligned and NUMA nodes might + * not be. If the TDMR covers more than one node, just use the _first_ + * one. This can lead to small areas of off-node metadata for some + * memory. + */ +static int tdmr_get_nid(struct tdmr_info *tdmr) +{ + unsigned long start_pfn, end_pfn; + int i, nid; + + /* Find the first memory region covered by the TDMR */ + memblock_for_each_tdx_mem_pfn_range(i, &start_pfn, &end_pfn, &nid) { + if (end_pfn > (tdmr_start(tdmr) >> PAGE_SHIFT)) + return nid; + } + + /* + * No memory region found for this TDMR. It cannot happen since + * when one TDMR is created, it must cover at least one (or + * partial) memory region. + */ + WARN_ON_ONCE(1); + return 0; +} + +static int tdmr_set_up_pamt(struct tdmr_info *tdmr) +{ + unsigned long pamt_base[TDX_PG_MAX]; + unsigned long pamt_size[TDX_PG_MAX]; + unsigned long tdmr_pamt_base; + unsigned long tdmr_pamt_size; + enum tdx_page_sz pgsz; + struct page *pamt; + int nid; + + nid = tdmr_get_nid(tdmr); + + /* + * Calculate the PAMT size for each TDX supported page size + * and the total PAMT size. + */ + tdmr_pamt_size = 0; + for (pgsz = TDX_PG_4K; pgsz < TDX_PG_MAX; pgsz++) { + pamt_size[pgsz] = tdmr_get_pamt_sz(tdmr, pgsz); + tdmr_pamt_size += pamt_size[pgsz]; + } + + /* + * Allocate one chunk of physically contiguous memory for all + * PAMTs. This helps minimize the PAMT's use of reserved areas + * in overlapped TDMRs. + */ + pamt = alloc_contig_pages(tdmr_pamt_size >> PAGE_SHIFT, GFP_KERNEL, + nid, &node_online_map); + if (!pamt) + return -ENOMEM; + + /* Calculate PAMT base and size for all supported page sizes. */ + tdmr_pamt_base = page_to_pfn(pamt) << PAGE_SHIFT; + for (pgsz = TDX_PG_4K; pgsz < TDX_PG_MAX; pgsz++) { + pamt_base[pgsz] = tdmr_pamt_base; + tdmr_pamt_base += pamt_size[pgsz]; + } + + tdmr->pamt_4k_base = pamt_base[TDX_PG_4K]; + tdmr->pamt_4k_size = pamt_size[TDX_PG_4K]; + tdmr->pamt_2m_base = pamt_base[TDX_PG_2M]; + tdmr->pamt_2m_size = pamt_size[TDX_PG_2M]; + tdmr->pamt_1g_base = pamt_base[TDX_PG_1G]; + tdmr->pamt_1g_size = pamt_size[TDX_PG_1G]; + + return 0; +} + +static void tdmr_get_pamt(struct tdmr_info *tdmr, unsigned long *pamt_pfn, + unsigned long *pamt_npages) +{ + unsigned long pamt_base, pamt_sz; + + /* + * The PAMT was allocated in one contiguous unit. The 4K PAMT + * should always point to the beginning of that allocation. + */ + pamt_base = tdmr->pamt_4k_base; + pamt_sz = tdmr->pamt_4k_size + tdmr->pamt_2m_size + tdmr->pamt_1g_size; + + *pamt_pfn = pamt_base >> PAGE_SHIFT; + *pamt_npages = pamt_sz >> PAGE_SHIFT; +} + +static void tdmr_free_pamt(struct tdmr_info *tdmr) +{ + unsigned long pamt_pfn, pamt_npages; + + tdmr_get_pamt(tdmr, &pamt_pfn, &pamt_npages); + + /* Do nothing if PAMT hasn't been allocated for this TDMR */ + if (!pamt_npages) + return; + + if (WARN_ON_ONCE(!pamt_pfn)) + return; + + free_contig_range(pamt_pfn, pamt_npages); +} + +static void tdmrs_free_pamt_all(struct tdmr_info *tdmr_array, int tdmr_num) +{ + int i; + + for (i = 0; i < tdmr_num; i++) + tdmr_free_pamt(tdmr_array_entry(tdmr_array, i)); +} + +/* Allocate and set up PAMTs for all TDMRs */ +static int tdmrs_set_up_pamt_all(struct tdmr_info *tdmr_array, int tdmr_num) +{ + int i, ret = 0; + + for (i = 0; i < tdmr_num; i++) { + ret = tdmr_set_up_pamt(tdmr_array_entry(tdmr_array, i)); + if (ret) + goto err; + } + + return 0; +err: + tdmrs_free_pamt_all(tdmr_array, tdmr_num); + return ret; +} + +static unsigned long tdmrs_get_pamt_pages(struct tdmr_info *tdmr_array, + int tdmr_num) +{ + unsigned long pamt_npages = 0; + int i; + + for (i = 0; i < tdmr_num; i++) { + unsigned long pfn, npages; + + tdmr_get_pamt(tdmr_array_entry(tdmr_array, i), &pfn, &npages); + pamt_npages += npages; + } + + return pamt_npages; +} + /* * Construct an array of TDMRs to cover all memory regions in memblock. * This makes sure all pages managed by the page allocator are TDX @@ -572,8 +762,13 @@ static int construct_tdmrs_memeblock(struct tdmr_info *tdmr_array, if (ret) goto err; + ret = tdmrs_set_up_pamt_all(tdmr_array, *tdmr_num); + if (ret) + goto err; + /* Return -EINVAL until constructing TDMRs is done */ ret = -EINVAL; + tdmrs_free_pamt_all(tdmr_array, *tdmr_num); err: return ret; } @@ -644,6 +839,11 @@ static int init_tdx_module(void) * process are done. */ ret = -EINVAL; + if (ret) + tdmrs_free_pamt_all(tdmr_array, tdmr_num); + else + pr_info("%lu pages allocated for PAMT.\n", + tdmrs_get_pamt_pages(tdmr_array, tdmr_num)); out_free_tdmrs: /* * The array of TDMRs is freed no matter the initialization is From patchwork Wed Jun 22 11:17:05 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Kai" X-Patchwork-Id: 12890553 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id BFA68C43334 for ; Wed, 22 Jun 2022 11:19:43 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1357636AbiFVLTl (ORCPT ); Wed, 22 Jun 2022 07:19:41 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56472 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1357563AbiFVLTP (ORCPT ); Wed, 22 Jun 2022 07:19:15 -0400 Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 60CE03C73F; Wed, 22 Jun 2022 04:18:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1655896687; x=1687432687; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=3JGxoe6NHexl//cHqZzIWIKFvcAtKyhn1DljxO9aurQ=; b=ZreCxMPSEdhDPf0yAvfKhQqEliGH+I8epmtCnye1iIzo0hNky424Xq0h UooyAx9QEAQTTSVNLMnOU6c95lSJWfGD0P7/b/+aH1ToeRv1v7cpLQFYN BAZJBauNNoUMQ2utvy8Cue8SbJaAYMRrYlwRBal+ZN0unwOr6abWPiJDg qNc+bPjmh1ewdsJlEQNOP12+qGu+Nk3Sn881eBsFjLlrYH2AwEk2GOGeA aQJZiznZ3TWLLR+T8mRSeCNBdcf/DEOLtI538s6qS41JtWpTK5i8ZJiHZ Cdt52taNBQUEaAf6Es+wXeM2nRMRRn28/DGH+hFMMMsPcFMhwTu0GenuH Q==; X-IronPort-AV: E=McAfee;i="6400,9594,10385"; a="305841127" X-IronPort-AV: E=Sophos;i="5.92,212,1650956400"; d="scan'208";a="305841127" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Jun 2022 04:17:39 -0700 X-IronPort-AV: E=Sophos;i="5.92,212,1650956400"; d="scan'208";a="730302302" Received: from jmatsis-mobl.amr.corp.intel.com (HELO khuang2-desk.gar.corp.intel.com) ([10.209.178.197]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Jun 2022 04:17:36 -0700 From: Kai Huang To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: seanjc@google.com, pbonzini@redhat.com, dave.hansen@intel.com, len.brown@intel.com, tony.luck@intel.com, rafael.j.wysocki@intel.com, reinette.chatre@intel.com, dan.j.williams@intel.com, peterz@infradead.org, ak@linux.intel.com, kirill.shutemov@linux.intel.com, sathyanarayanan.kuppuswamy@linux.intel.com, isaku.yamahata@intel.com, kai.huang@intel.com Subject: [PATCH v5 16/22] x86/virt/tdx: Set up reserved areas for all TDMRs Date: Wed, 22 Jun 2022 23:17:05 +1200 Message-Id: <984ae2b9201876e9ac22399cf36d26ad6eff1007.1655894131.git.kai.huang@intel.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org As the last step of constructing TDMRs, set up reserved areas for all TDMRs. For each TDMR, put all memory holes within this TDMR to the reserved areas. And for all PAMTs which overlap with this TDMR, put all the overlapping parts to reserved areas too. Signed-off-by: Kai Huang --- arch/x86/virt/vmx/tdx/tdx.c | 160 +++++++++++++++++++++++++++++++++++- 1 file changed, 158 insertions(+), 2 deletions(-) diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c index 36260dd7e69f..86d98c47bd37 100644 --- a/arch/x86/virt/vmx/tdx/tdx.c +++ b/arch/x86/virt/vmx/tdx/tdx.c @@ -19,6 +19,7 @@ #include #include #include +#include #include #include #include @@ -748,6 +749,157 @@ static unsigned long tdmrs_get_pamt_pages(struct tdmr_info *tdmr_array, return pamt_npages; } +static int tdmr_add_rsvd_area(struct tdmr_info *tdmr, int *p_idx, + u64 addr, u64 size) +{ + struct tdmr_reserved_area *rsvd_areas = tdmr->reserved_areas; + int idx = *p_idx; + + /* Reserved area must be 4K aligned in offset and size */ + if (WARN_ON(addr & ~PAGE_MASK || size & ~PAGE_MASK)) + return -EINVAL; + + /* Cannot exceed maximum reserved areas supported by TDX */ + if (idx >= tdx_sysinfo.max_reserved_per_tdmr) + return -E2BIG; + + rsvd_areas[idx].offset = addr - tdmr->base; + rsvd_areas[idx].size = size; + + *p_idx = idx + 1; + + return 0; +} + +/* Compare function called by sort() for TDMR reserved areas */ +static int rsvd_area_cmp_func(const void *a, const void *b) +{ + struct tdmr_reserved_area *r1 = (struct tdmr_reserved_area *)a; + struct tdmr_reserved_area *r2 = (struct tdmr_reserved_area *)b; + + if (r1->offset + r1->size <= r2->offset) + return -1; + if (r1->offset >= r2->offset + r2->size) + return 1; + + /* Reserved areas cannot overlap. Caller should guarantee. */ + WARN_ON_ONCE(1); + return -1; +} + +/* Set up reserved areas for a TDMR, including memory holes and PAMTs */ +static int tdmr_set_up_rsvd_areas(struct tdmr_info *tdmr, + struct tdmr_info *tdmr_array, + int tdmr_num) +{ + unsigned long start_pfn, end_pfn; + int rsvd_idx, i, ret = 0; + u64 prev_end; + + /* Mark holes between memory regions as reserved */ + rsvd_idx = 0; + prev_end = tdmr_start(tdmr); + memblock_for_each_tdx_mem_pfn_range(i, &start_pfn, &end_pfn, NULL) { + u64 start, end; + + start = start_pfn << PAGE_SHIFT; + end = end_pfn << PAGE_SHIFT; + + /* Break if this region is after the TDMR */ + if (start >= tdmr_end(tdmr)) + break; + + /* Exclude regions before this TDMR */ + if (end < tdmr_start(tdmr)) + continue; + + /* + * Skip if no hole exists before this region. "<=" is + * used because one memory region might span two TDMRs + * (when the previous TDMR covers part of this region). + * In this case the start address of this region is + * smaller than the start address of the second TDMR. + * + * Update the prev_end to the end of this region where + * the possible memory hole starts. + */ + if (start <= prev_end) { + prev_end = end; + continue; + } + + /* Add the hole before this region */ + ret = tdmr_add_rsvd_area(tdmr, &rsvd_idx, prev_end, + start - prev_end); + if (ret) + return ret; + + prev_end = end; + } + + /* Add the hole after the last region if it exists. */ + if (prev_end < tdmr_end(tdmr)) { + ret = tdmr_add_rsvd_area(tdmr, &rsvd_idx, prev_end, + tdmr_end(tdmr) - prev_end); + if (ret) + return ret; + } + + /* + * If any PAMT overlaps with this TDMR, the overlapping part + * must also be put to the reserved area too. Walk over all + * TDMRs to find out those overlapping PAMTs and put them to + * reserved areas. + */ + for (i = 0; i < tdmr_num; i++) { + struct tdmr_info *tmp = tdmr_array_entry(tdmr_array, i); + u64 pamt_start, pamt_end; + + pamt_start = tmp->pamt_4k_base; + pamt_end = pamt_start + tmp->pamt_4k_size + + tmp->pamt_2m_size + tmp->pamt_1g_size; + + /* Skip PAMTs outside of the given TDMR */ + if ((pamt_end <= tdmr_start(tdmr)) || + (pamt_start >= tdmr_end(tdmr))) + continue; + + /* Only mark the part within the TDMR as reserved */ + if (pamt_start < tdmr_start(tdmr)) + pamt_start = tdmr_start(tdmr); + if (pamt_end > tdmr_end(tdmr)) + pamt_end = tdmr_end(tdmr); + + ret = tdmr_add_rsvd_area(tdmr, &rsvd_idx, pamt_start, + pamt_end - pamt_start); + if (ret) + return ret; + } + + /* TDX requires reserved areas listed in address ascending order */ + sort(tdmr->reserved_areas, rsvd_idx, sizeof(struct tdmr_reserved_area), + rsvd_area_cmp_func, NULL); + + return 0; +} + +static int tdmrs_set_up_rsvd_areas_all(struct tdmr_info *tdmr_array, + int tdmr_num) +{ + int i; + + for (i = 0; i < tdmr_num; i++) { + int ret; + + ret = tdmr_set_up_rsvd_areas(tdmr_array_entry(tdmr_array, i), + tdmr_array, tdmr_num); + if (ret) + return ret; + } + + return 0; +} + /* * Construct an array of TDMRs to cover all memory regions in memblock. * This makes sure all pages managed by the page allocator are TDX @@ -766,8 +918,12 @@ static int construct_tdmrs_memeblock(struct tdmr_info *tdmr_array, if (ret) goto err; - /* Return -EINVAL until constructing TDMRs is done */ - ret = -EINVAL; + ret = tdmrs_set_up_rsvd_areas_all(tdmr_array, *tdmr_num); + if (ret) + goto err_free_pamts; + + return 0; +err_free_pamts: tdmrs_free_pamt_all(tdmr_array, *tdmr_num); err: return ret; From patchwork Wed Jun 22 11:17:06 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Kai" X-Patchwork-Id: 12890552 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8FE1EC433EF for ; Wed, 22 Jun 2022 11:19:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1357631AbiFVLTk (ORCPT ); Wed, 22 Jun 2022 07:19:40 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56352 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1357564AbiFVLTP (ORCPT ); Wed, 22 Jun 2022 07:19:15 -0400 Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A2E1E3CA43; Wed, 22 Jun 2022 04:18:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1655896688; x=1687432688; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=xNPSVdToXKtUmSd8+2K4u5Tm3rdMWLpFdFZbw1UIUkQ=; b=gOKW0ucG+3d3QnuKaPuSXL+LnHzg2GOaHhd98zZaKSizYosCMsjn/ptf u7gNPchWBwluw1rrzUV/OwE+4tL1oI2vn01msEvxCZvCO+qFLdMiEced8 lir3jHiF6Q54VVJ5hM2ZcxZlDWl7ef6JobLHri1QtDi4sGnmWy9dx39/+ bZBg7WJaroxtez63sNlDoCsCqy6RxXryqTVhJCvO7g8vl6PXaJYavk8Zs BDPeN08cIzCsTpb5b4dJP3/XxaTeq2v99B/dDpUcWnQ6oljOEE+MgWM4d 3z4+yWEJsmMJ0KYinwB1OlXFab4aeVC+0jSQIWta5yM0wp9Fw3qI4KixJ g==; X-IronPort-AV: E=McAfee;i="6400,9594,10385"; a="305841141" X-IronPort-AV: E=Sophos;i="5.92,212,1650956400"; d="scan'208";a="305841141" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Jun 2022 04:17:43 -0700 X-IronPort-AV: E=Sophos;i="5.92,212,1650956400"; d="scan'208";a="730302322" Received: from jmatsis-mobl.amr.corp.intel.com (HELO khuang2-desk.gar.corp.intel.com) ([10.209.178.197]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Jun 2022 04:17:40 -0700 From: Kai Huang To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: seanjc@google.com, pbonzini@redhat.com, dave.hansen@intel.com, len.brown@intel.com, tony.luck@intel.com, rafael.j.wysocki@intel.com, reinette.chatre@intel.com, dan.j.williams@intel.com, peterz@infradead.org, ak@linux.intel.com, kirill.shutemov@linux.intel.com, sathyanarayanan.kuppuswamy@linux.intel.com, isaku.yamahata@intel.com, kai.huang@intel.com Subject: [PATCH v5 17/22] x86/virt/tdx: Reserve TDX module global KeyID Date: Wed, 22 Jun 2022 23:17:06 +1200 Message-Id: X-Mailer: git-send-email 2.36.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org TDX module initialization requires to use one TDX private KeyID as the global KeyID to protect the TDX module metadata. The global KeyID is configured to the TDX module along with TDMRs. Just reserve the first TDX private KeyID as the global KeyID. Keep the global KeyID as a static variable as KVM will need to use it too. Signed-off-by: Kai Huang --- arch/x86/virt/vmx/tdx/tdx.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c index 86d98c47bd37..df87a9f9ee24 100644 --- a/arch/x86/virt/vmx/tdx/tdx.c +++ b/arch/x86/virt/vmx/tdx/tdx.c @@ -55,6 +55,9 @@ static struct tdsysinfo_struct tdx_sysinfo; static struct cmr_info tdx_cmr_array[MAX_CMRS] __aligned(CMR_INFO_ARRAY_ALIGNMENT); static int tdx_cmr_num; +/* TDX module global KeyID. Used in TDH.SYS.CONFIG ABI. */ +static u32 tdx_global_keyid; + /* Detect whether CPU supports SEAM */ static int detect_seam(void) { @@ -990,6 +993,12 @@ static int init_tdx_module(void) if (ret) goto out_free_tdmrs; + /* + * Reserve the first TDX KeyID as global KeyID to protect + * TDX module metadata. + */ + tdx_global_keyid = tdx_keyid_start; + /* * Return -EINVAL until all steps of TDX module initialization * process are done. From patchwork Wed Jun 22 11:17:07 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Kai" X-Patchwork-Id: 12890555 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3D5B9CCA47E for ; Wed, 22 Jun 2022 11:19:50 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1357424AbiFVLTs (ORCPT ); Wed, 22 Jun 2022 07:19:48 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57420 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1357421AbiFVLTR (ORCPT ); Wed, 22 Jun 2022 07:19:17 -0400 Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 23D6028728; Wed, 22 Jun 2022 04:18:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1655896691; x=1687432691; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=MTnlyo2iHdW3o9QWbyFBSiwA5KoWcP+Jxwe6wW1Mb8M=; b=c8uC54EtovDBuhl2JLJTGjnVzYA70ti07Nhfdg53GH9Ywj8myBUs5v4T Sy0LokuDLBoanVAtylzokGcVtAwXUAR/fGX6KLDGJmST+IOrgc9rtZTpm iK4GhGou/vbiax7GfgwFT9J9qcouXphOSdP5tC9VOhmQP/2RuJQriJNDZ JvT6/fogqZjaSw1Ce/iNQ0OarqJolVMPg6vEqblaHvTVTUpsfZDW7OfoC Dqslg9d2i42HOyr87YhVQ1PeMuZLodiQHcEjcKOZP3SqtS8rsMDwJqvaq Wo6TsuQ1vX6NyiyEQpmZxVXZkNq1W7asj1u85rynNpXj88kHHhodD+P/m g==; X-IronPort-AV: E=McAfee;i="6400,9594,10385"; a="305841148" X-IronPort-AV: E=Sophos;i="5.92,212,1650956400"; d="scan'208";a="305841148" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Jun 2022 04:17:46 -0700 X-IronPort-AV: E=Sophos;i="5.92,212,1650956400"; d="scan'208";a="730302335" Received: from jmatsis-mobl.amr.corp.intel.com (HELO khuang2-desk.gar.corp.intel.com) ([10.209.178.197]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Jun 2022 04:17:43 -0700 From: Kai Huang To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: seanjc@google.com, pbonzini@redhat.com, dave.hansen@intel.com, len.brown@intel.com, tony.luck@intel.com, rafael.j.wysocki@intel.com, reinette.chatre@intel.com, dan.j.williams@intel.com, peterz@infradead.org, ak@linux.intel.com, kirill.shutemov@linux.intel.com, sathyanarayanan.kuppuswamy@linux.intel.com, isaku.yamahata@intel.com, kai.huang@intel.com Subject: [PATCH v5 18/22] x86/virt/tdx: Configure TDX module with TDMRs and global KeyID Date: Wed, 22 Jun 2022 23:17:07 +1200 Message-Id: <2e7d7e7e36e81ec9f13f8c796dee24517169b39c.1655894131.git.kai.huang@intel.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org After the TDX-usable memory regions are constructed in an array of TDMRs and the global KeyID is reserved, configure them to the TDX module using TDH.SYS.CONFIG SEAMCALL. TDH.SYS.CONFIG can only be called once and can be done on any logical cpu. Signed-off-by: Kai Huang --- arch/x86/virt/vmx/tdx/tdx.c | 38 +++++++++++++++++++++++++++++++++++++ arch/x86/virt/vmx/tdx/tdx.h | 2 ++ 2 files changed, 40 insertions(+) diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c index df87a9f9ee24..06e26379b632 100644 --- a/arch/x86/virt/vmx/tdx/tdx.c +++ b/arch/x86/virt/vmx/tdx/tdx.c @@ -18,6 +18,7 @@ #include #include #include +#include #include #include #include @@ -932,6 +933,37 @@ static int construct_tdmrs_memeblock(struct tdmr_info *tdmr_array, return ret; } +static int config_tdx_module(struct tdmr_info *tdmr_array, int tdmr_num, + u64 global_keyid) +{ + u64 *tdmr_pa_array; + int i, array_sz; + u64 ret; + + /* + * TDMR_INFO entries are configured to the TDX module via an + * array of the physical address of each TDMR_INFO. TDX module + * requires the array itself to be 512-byte aligned. Round up + * the array size to 512-byte aligned so the buffer allocated + * by kzalloc() will meet the alignment requirement. + */ + array_sz = ALIGN(tdmr_num * sizeof(u64), TDMR_INFO_PA_ARRAY_ALIGNMENT); + tdmr_pa_array = kzalloc(array_sz, GFP_KERNEL); + if (!tdmr_pa_array) + return -ENOMEM; + + for (i = 0; i < tdmr_num; i++) + tdmr_pa_array[i] = __pa(tdmr_array_entry(tdmr_array, i)); + + ret = seamcall(TDH_SYS_CONFIG, __pa(tdmr_pa_array), tdmr_num, + global_keyid, 0, NULL); + + /* Free the array as it is not required any more. */ + kfree(tdmr_pa_array); + + return ret ? -EFAULT : 0; +} + /* * Detect and initialize the TDX module. * @@ -999,11 +1031,17 @@ static int init_tdx_module(void) */ tdx_global_keyid = tdx_keyid_start; + /* Pass the TDMRs and the global KeyID to the TDX module */ + ret = config_tdx_module(tdmr_array, tdmr_num, tdx_global_keyid); + if (ret) + goto out_free_pamts; + /* * Return -EINVAL until all steps of TDX module initialization * process are done. */ ret = -EINVAL; +out_free_pamts: if (ret) tdmrs_free_pamt_all(tdmr_array, tdmr_num); else diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h index 55d6c69ab900..b9bc499b965b 100644 --- a/arch/x86/virt/vmx/tdx/tdx.h +++ b/arch/x86/virt/vmx/tdx/tdx.h @@ -53,6 +53,7 @@ #define TDH_SYS_INIT 33 #define TDH_SYS_LP_INIT 35 #define TDH_SYS_LP_SHUTDOWN 44 +#define TDH_SYS_CONFIG 45 struct cmr_info { u64 base; @@ -120,6 +121,7 @@ struct tdmr_reserved_area { } __packed; #define TDMR_INFO_ALIGNMENT 512 +#define TDMR_INFO_PA_ARRAY_ALIGNMENT 512 struct tdmr_info { u64 base; From patchwork Wed Jun 22 11:17:08 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Kai" X-Patchwork-Id: 12890556 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 22BE2C43334 for ; Wed, 22 Jun 2022 11:19:52 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1357677AbiFVLTu (ORCPT ); Wed, 22 Jun 2022 07:19:50 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57812 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1357511AbiFVLTS (ORCPT ); Wed, 22 Jun 2022 07:19:18 -0400 Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 48FDB3CA69; Wed, 22 Jun 2022 04:18:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1655896692; x=1687432692; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=lYZ0J/QJ0w2QknVOB8G43XbPW/02GAYwoVqrSVBVKU8=; b=Q7oHsxtxsbMYeaUCoAmVHb4Rp9Geupbu9PcObvzir///TG4cUoSQjRML aPBs/T3ph68jkrkK+XffcrIBd6lPN+I3rAq5T23Lc21O4kLmDtxWDmvRU zDVAUOJhJqq8BL4VsO6yGe/XX92VWAKFFxhcoZOHx6Za1agK7BDW72wKB oX9S9NmS97e2aYmK7GOnC4TFT7NJFZl2SDNrMqNb6XGg5NmtKJPAxfQ9b JQAVDpFe644VIp1Yd+qr4wlOsFppuYYpIv8pfTn7YuqCz7ZooY8KL7KgT A44V4DrfGE3CfZ0KP3+lS2lxJSSw40mHCHO3RJ2dLcdY7yS4YN1LBf4Gw A==; X-IronPort-AV: E=McAfee;i="6400,9594,10385"; a="305841164" X-IronPort-AV: E=Sophos;i="5.92,212,1650956400"; d="scan'208";a="305841164" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Jun 2022 04:17:50 -0700 X-IronPort-AV: E=Sophos;i="5.92,212,1650956400"; d="scan'208";a="730302351" Received: from jmatsis-mobl.amr.corp.intel.com (HELO khuang2-desk.gar.corp.intel.com) ([10.209.178.197]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Jun 2022 04:17:46 -0700 From: Kai Huang To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: seanjc@google.com, pbonzini@redhat.com, dave.hansen@intel.com, len.brown@intel.com, tony.luck@intel.com, rafael.j.wysocki@intel.com, reinette.chatre@intel.com, dan.j.williams@intel.com, peterz@infradead.org, ak@linux.intel.com, kirill.shutemov@linux.intel.com, sathyanarayanan.kuppuswamy@linux.intel.com, isaku.yamahata@intel.com, kai.huang@intel.com Subject: [PATCH v5 19/22] x86/virt/tdx: Configure global KeyID on all packages Date: Wed, 22 Jun 2022 23:17:08 +1200 Message-Id: <756655ead5cb8307033409436cf74029c842dc09.1655894131.git.kai.huang@intel.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org After the array of TDMRs and the global KeyID are configured to the TDX module, use TDH.SYS.KEY.CONFIG to configure the key of the global KeyID on all packages. TDH.SYS.KEY.CONFIG must be done on one (any) cpu for each package. And it cannot run concurrently on different CPUs. Implement a helper to run SEAMCALL on one cpu for each package one by one, and use it to configure the global KeyID on all packages. Intel hardware doesn't guarantee cache coherency across different KeyIDs. The kernel needs to flush PAMT's dirty cachelines (associated with KeyID 0) before the TDX module uses the global KeyID to access the PAMT. Following the TDX module specification, flush cache before configuring the global KeyID on all packages. Given the PAMT size can be large (~1/256th of system RAM), just use WBINVD on all CPUs to flush. Signed-off-by: Kai Huang --- arch/x86/virt/vmx/tdx/tdx.c | 83 ++++++++++++++++++++++++++++++++++++- arch/x86/virt/vmx/tdx/tdx.h | 1 + 2 files changed, 82 insertions(+), 2 deletions(-) diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c index 06e26379b632..b9777a353835 100644 --- a/arch/x86/virt/vmx/tdx/tdx.c +++ b/arch/x86/virt/vmx/tdx/tdx.c @@ -194,6 +194,46 @@ static void seamcall_on_each_cpu(struct seamcall_ctx *sc) on_each_cpu(seamcall_smp_call_function, sc, true); } +/* + * Call one SEAMCALL on one (any) cpu for each physical package in + * serialized way. Return immediately in case of any error if + * SEAMCALL fails on any cpu. + * + * Note for serialized calls 'struct seamcall_ctx::err' doesn't have + * to be atomic, but for simplicity just reuse it instead of adding + * a new one. + */ +static int seamcall_on_each_package_serialized(struct seamcall_ctx *sc) +{ + cpumask_var_t packages; + int cpu, ret = 0; + + if (!zalloc_cpumask_var(&packages, GFP_KERNEL)) + return -ENOMEM; + + for_each_online_cpu(cpu) { + if (cpumask_test_and_set_cpu(topology_physical_package_id(cpu), + packages)) + continue; + + ret = smp_call_function_single(cpu, seamcall_smp_call_function, + sc, true); + if (ret) + break; + + /* + * Doesn't have to use atomic_read(), but it doesn't + * hurt either. + */ + ret = atomic_read(&sc->err); + if (ret) + break; + } + + free_cpumask_var(packages); + return ret; +} + /* * Do TDX module global initialization. It also detects whether the * module has been loaded or not. @@ -964,6 +1004,21 @@ static int config_tdx_module(struct tdmr_info *tdmr_array, int tdmr_num, return ret ? -EFAULT : 0; } +static int config_global_keyid(void) +{ + struct seamcall_ctx sc = { .fn = TDH_SYS_KEY_CONFIG }; + + /* + * Configure the key of the global KeyID on all packages by + * calling TDH.SYS.KEY.CONFIG on all packages. + * + * TDH.SYS.KEY.CONFIG may fail with entropy error (which is + * a recoverable error). Assume this is exceedingly rare and + * just return error if encountered instead of retrying. + */ + return seamcall_on_each_package_serialized(&sc); +} + /* * Detect and initialize the TDX module. * @@ -1036,15 +1091,39 @@ static int init_tdx_module(void) if (ret) goto out_free_pamts; + /* + * Hardware doesn't guarantee cache coherency across different + * KeyIDs. The kernel needs to flush PAMT's dirty cachelines + * (associated with KeyID 0) before the TDX module can use the + * global KeyID to access the PAMT. Given PAMTs are potentially + * large (~1/256th of system RAM), just use WBINVD on all cpus + * to flush the cache. + * + * Follow the TDX spec to flush cache before configuring the + * global KeyID on all packages. + */ + wbinvd_on_all_cpus(); + + /* Config the key of global KeyID on all packages */ + ret = config_global_keyid(); + if (ret) + goto out_free_pamts; + /* * Return -EINVAL until all steps of TDX module initialization * process are done. */ ret = -EINVAL; out_free_pamts: - if (ret) + if (ret) { + /* + * Part of PAMT may already have been initialized by + * TDX module. Flush cache before returning PAMT back + * to the kernel. + */ + wbinvd_on_all_cpus(); tdmrs_free_pamt_all(tdmr_array, tdmr_num); - else + } else pr_info("%lu pages allocated for PAMT.\n", tdmrs_get_pamt_pages(tdmr_array, tdmr_num)); out_free_tdmrs: diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h index b9bc499b965b..2d25a93b89ef 100644 --- a/arch/x86/virt/vmx/tdx/tdx.h +++ b/arch/x86/virt/vmx/tdx/tdx.h @@ -49,6 +49,7 @@ /* * TDX module SEAMCALL leaf functions */ +#define TDH_SYS_KEY_CONFIG 31 #define TDH_SYS_INFO 32 #define TDH_SYS_INIT 33 #define TDH_SYS_LP_INIT 35 From patchwork Wed Jun 22 11:17:48 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Kai" X-Patchwork-Id: 12890550 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8D2BBC433EF for ; Wed, 22 Jun 2022 11:19:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1357493AbiFVLTh (ORCPT ); Wed, 22 Jun 2022 07:19:37 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58210 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1357187AbiFVLTN (ORCPT ); Wed, 22 Jun 2022 07:19:13 -0400 Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D0FF32C640; Wed, 22 Jun 2022 04:18:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1655896681; x=1687432681; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=EU9UVN23kTwZDQsiPFFMtJVL0oXFbDaSRd4fV8w3EqE=; b=BSHoJECbbWIE3+Cx8n/OVyOTcdq7i0xMOcwKBVZGWE/F0WLpJvo+7qbs nTu9RLCHleJyrxXygYxsw8O+Ek1U5uFGA/qM9ALJ+eiLOPQHr8Cn69tJB BWF348VpHp+q6eOohWrBOgtB06lL02t6TOXb5VI4y665WHMyzRuEsHIZD 1t0YzmCF0HivDf5wkNv1VpByeKcdNIELRxzgsWP2KjiPlKM0ndNu1KUj9 OT/1DUsrUR2qIlgYAr2XudLW9EOr76KGQu8OMefCQD9on42hIrlEiUO3i /zVEAk5u2lnVkJRmqVn0J9+70GiWKHfzF7uIBuhj8O7YWc3mUAKnxRglx A==; X-IronPort-AV: E=McAfee;i="6400,9594,10385"; a="366713427" X-IronPort-AV: E=Sophos;i="5.92,212,1650956400"; d="scan'208";a="366713427" Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Jun 2022 04:18:01 -0700 X-IronPort-AV: E=Sophos;i="5.92,212,1650956400"; d="scan'208";a="834065918" Received: from jmatsis-mobl.amr.corp.intel.com (HELO khuang2-desk.gar.corp.intel.com) ([10.209.178.197]) by fmsmga006-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Jun 2022 04:17:58 -0700 From: Kai Huang To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: seanjc@google.com, pbonzini@redhat.com, dave.hansen@intel.com, len.brown@intel.com, tony.luck@intel.com, rafael.j.wysocki@intel.com, reinette.chatre@intel.com, dan.j.williams@intel.com, peterz@infradead.org, ak@linux.intel.com, kirill.shutemov@linux.intel.com, sathyanarayanan.kuppuswamy@linux.intel.com, isaku.yamahata@intel.com, kai.huang@intel.com Subject: [PATCH v5 20/22] x86/virt/tdx: Initialize all TDMRs Date: Wed, 22 Jun 2022 23:17:48 +1200 Message-Id: <58db9a30a179907aa9331e45900df7395d17c80c.1655894131.git.kai.huang@intel.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Initialize TDMRs via TDH.SYS.TDMR.INIT as the last step to complete the TDX initialization. All TDMRs need to be initialized using TDH.SYS.TDMR.INIT SEAMCALL before the memory pages can be used by the TDX module. The time to initialize TDMR is proportional to the size of the TDMR because TDH.SYS.TDMR.INIT internally initializes the PAMT entries using the global KeyID. To avoid long latency caused in one SEAMCALL, TDH.SYS.TDMR.INIT only initializes an (implementation-specific) subset of PAMT entries of one TDMR in one invocation. The caller needs to call TDH.SYS.TDMR.INIT iteratively until all PAMT entries of the given TDMR are initialized. TDH.SYS.TDMR.INITs can run concurrently on multiple CPUs as long as they are initializing different TDMRs. To keep it simple, just initialize all TDMRs one by one. On a 2-socket machine with 2.2G CPUs and 64GB memory, each TDH.SYS.TDMR.INIT roughly takes ~7us on average, and it takes roughly ~100ms to complete initializing all TDMRs while system is idle. Signed-off-by: Kai Huang --- arch/x86/virt/vmx/tdx/tdx.c | 70 ++++++++++++++++++++++++++++++++++--- arch/x86/virt/vmx/tdx/tdx.h | 1 + 2 files changed, 66 insertions(+), 5 deletions(-) diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c index b9777a353835..da1af1b60c35 100644 --- a/arch/x86/virt/vmx/tdx/tdx.c +++ b/arch/x86/virt/vmx/tdx/tdx.c @@ -1019,6 +1019,65 @@ static int config_global_keyid(void) return seamcall_on_each_package_serialized(&sc); } +/* Initialize one TDMR */ +static int init_tdmr(struct tdmr_info *tdmr) +{ + u64 next; + + /* + * Initializing PAMT entries might be time-consuming (in + * proportion to the size of the requested TDMR). To avoid long + * latency in one SEAMCALL, TDH.SYS.TDMR.INIT only initializes + * an (implementation-defined) subset of PAMT entries in one + * invocation. + * + * Call TDH.SYS.TDMR.INIT iteratively until all PAMT entries + * of the requested TDMR are initialized (if next-to-initialize + * address matches the end address of the TDMR). + */ + do { + struct tdx_module_output out; + u64 ret; + + ret = seamcall(TDH_SYS_TDMR_INIT, tdmr->base, 0, 0, 0, &out); + if (ret) + return -EFAULT; + /* + * RDX contains 'next-to-initialize' address if + * TDH.SYS.TDMR.INT succeeded. + */ + next = out.rdx; + /* Allow scheduling when needed */ + if (need_resched()) + cond_resched(); + } while (next < tdmr->base + tdmr->size); + + return 0; +} + +/* Initialize all TDMRs */ +static int init_tdmrs(struct tdmr_info *tdmr_array, int tdmr_num) +{ + int i; + + /* + * Initialize TDMRs one-by-one for simplicity, though the TDX + * architecture does allow different TDMRs to be initialized in + * parallel on multiple CPUs. Parallel initialization could + * be added later when the time spent in the serialized scheme + * becomes a real concern. + */ + for (i = 0; i < tdmr_num; i++) { + int ret; + + ret = init_tdmr(tdmr_array_entry(tdmr_array, i)); + if (ret) + return ret; + } + + return 0; +} + /* * Detect and initialize the TDX module. * @@ -1109,11 +1168,12 @@ static int init_tdx_module(void) if (ret) goto out_free_pamts; - /* - * Return -EINVAL until all steps of TDX module initialization - * process are done. - */ - ret = -EINVAL; + /* Initialize TDMRs to complete the TDX module initialization */ + ret = init_tdmrs(tdmr_array, tdmr_num); + if (ret) + goto out_free_pamts; + + tdx_module_status = TDX_MODULE_INITIALIZED; out_free_pamts: if (ret) { /* diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h index 2d25a93b89ef..e0309558be13 100644 --- a/arch/x86/virt/vmx/tdx/tdx.h +++ b/arch/x86/virt/vmx/tdx/tdx.h @@ -53,6 +53,7 @@ #define TDH_SYS_INFO 32 #define TDH_SYS_INIT 33 #define TDH_SYS_LP_INIT 35 +#define TDH_SYS_TDMR_INIT 36 #define TDH_SYS_LP_SHUTDOWN 44 #define TDH_SYS_CONFIG 45 From patchwork Wed Jun 22 11:17:49 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Kai" X-Patchwork-Id: 12890551 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D00AEC43334 for ; Wed, 22 Jun 2022 11:19:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1357349AbiFVLTi (ORCPT ); Wed, 22 Jun 2022 07:19:38 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58218 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1357202AbiFVLTN (ORCPT ); Wed, 22 Jun 2022 07:19:13 -0400 Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1DE5E3BA79; Wed, 22 Jun 2022 04:18:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1655896685; x=1687432685; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=ntftiSqWbOB47rbOkwNsaejfB8DACjOwmAgRAleuan0=; b=eqr2QxU9910XUNTcB5dwMU9i+4GYvmx28hNpNglSTM8FvEM2TmEaeCrA MGrVkmI24lckxiHtcGyFuk7oEoXKH0Y2/Beon6dckMLoQoE6lLHEZGKNK wsDZjFHuhb/YHTOagfG5CrsjfF6lZm5gFcHk0y2ZMo4jB+XuonQNn9PMo KTmI9Q57YHIlV4Mn0mpKpa5hxqlb3lc1uvr1S8ZYNCCxVQUbKBi8yPlX7 FGjTaacF4ctrrgBokWYTGRH9Vp1zxBLWI5edaFXLfNpNr6XHtTvOisRGD bLUSCQ+372Gcfz2BYpCpr0OV6AsbkhTImI/K5UBkRGJzprxGOXWUiMQ5N Q==; X-IronPort-AV: E=McAfee;i="6400,9594,10385"; a="366713436" X-IronPort-AV: E=Sophos;i="5.92,212,1650956400"; d="scan'208";a="366713436" Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Jun 2022 04:18:04 -0700 X-IronPort-AV: E=Sophos;i="5.92,212,1650956400"; d="scan'208";a="834065936" Received: from jmatsis-mobl.amr.corp.intel.com (HELO khuang2-desk.gar.corp.intel.com) ([10.209.178.197]) by fmsmga006-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Jun 2022 04:18:01 -0700 From: Kai Huang To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: seanjc@google.com, pbonzini@redhat.com, dave.hansen@intel.com, len.brown@intel.com, tony.luck@intel.com, rafael.j.wysocki@intel.com, reinette.chatre@intel.com, dan.j.williams@intel.com, peterz@infradead.org, ak@linux.intel.com, kirill.shutemov@linux.intel.com, sathyanarayanan.kuppuswamy@linux.intel.com, isaku.yamahata@intel.com, kai.huang@intel.com Subject: [PATCH v5 21/22] x86/virt/tdx: Support kexec() Date: Wed, 22 Jun 2022 23:17:49 +1200 Message-Id: <9c0c25cbe70969e2aa3e68505cc7a7021a47a7ee.1655894131.git.kai.huang@intel.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org To support kexec(), if the TDX module is ever initialized, the kernel needs to flush all dirty cachelines associated with any TDX private KeyID, otherwise they may slightly corrupt the new kernel. Following SME support, use wbinvd() to flush cache in stop_this_cpu(). Theoretically, cache flush is only needed when the TDX module has been initialized. However initializing the TDX module is done on demand at runtime, and it takes a mutex to read the module status. Just check whether TDX is enabled by BIOS instead to flush cache. The current TDX module architecture doesn't play nicely with kexec(). The TDX module can only be initialized once during its lifetime, and there is no SEAMCALL to reset the module to give a new clean slate to the new kernel. Therefore, ideally, if the module is ever initialized, it's better to shut down the module. The new kernel won't be able to use TDX anyway (as it needs to go through the TDX module initialization process which will fail immediately at the first step). However, there's no guarantee CPU is in VMX operation during kexec(). This means it's impractical to shut down the module. Just do nothing but leave the module open. Signed-off-by: Kai Huang --- arch/x86/kernel/process.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c index dbaf12c43fe1..ff5449c23522 100644 --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c @@ -769,8 +769,15 @@ void __noreturn stop_this_cpu(void *dummy) * * Test the CPUID bit directly because the machine might've cleared * X86_FEATURE_SME due to cmdline options. + * + * Similar to SME, if the TDX module is ever initialized, the + * cachelines associated with any TDX private KeyID must be + * flushed before transiting to the new kernel. The TDX module + * is initialized on demand, and it takes the mutex to read it's + * status. Just check whether TDX is enabled by BIOS instead to + * flush cache. */ - if (cpuid_eax(0x8000001f) & BIT(0)) + if (cpuid_eax(0x8000001f) & BIT(0) || platform_tdx_enabled()) native_wbinvd(); for (;;) { /* From patchwork Wed Jun 22 11:17:50 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Kai" X-Patchwork-Id: 12890554 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6E1CDCCA47E for ; Wed, 22 Jun 2022 11:19:44 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1357645AbiFVLTn (ORCPT ); Wed, 22 Jun 2022 07:19:43 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57394 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1357574AbiFVLTR (ORCPT ); Wed, 22 Jun 2022 07:19:17 -0400 Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 604AE3CA47; Wed, 22 Jun 2022 04:18:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1655896689; x=1687432689; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=hTMxv5ATBtehYuUZfbLQ0w47uiDhHQy1BjaByM3jchE=; b=A5bvEOt75outqNKvaj/S5Wca7/Lla6OkdkPz2Z2nD2ilmn3tnLhweeJH tXoQlfSyJm0VQw5Pdol3VFB6MJRsNg/wcIRV3VkmFgFv2M78+XF2BQOmI NelPGI3KmTOcsdOP2qZx7tX/GLGwAoLhhMeongQ0521urvhHkfeDrpjsw +J0JT6qWAFLRtUeW5HHwKhXdKrpX3XvmXdU1UKPgmsL0wmvuDL9rmsRdF I3Q8/xYCMOO7nsXuM2tsdY5gtdnTigDniqPmVcJgWOWzHP7eA+JHfm31v T+jCXZH0XHZsCOO2hPxN5qZNeOgh0pR7rS6S3URdsadSF0ahdgxXfJw8x A==; X-IronPort-AV: E=McAfee;i="6400,9594,10385"; a="366713447" X-IronPort-AV: E=Sophos;i="5.92,212,1650956400"; d="scan'208";a="366713447" Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Jun 2022 04:18:08 -0700 X-IronPort-AV: E=Sophos;i="5.92,212,1650956400"; d="scan'208";a="834065982" Received: from jmatsis-mobl.amr.corp.intel.com (HELO khuang2-desk.gar.corp.intel.com) ([10.209.178.197]) by fmsmga006-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Jun 2022 04:18:05 -0700 From: Kai Huang To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: seanjc@google.com, pbonzini@redhat.com, dave.hansen@intel.com, len.brown@intel.com, tony.luck@intel.com, rafael.j.wysocki@intel.com, reinette.chatre@intel.com, dan.j.williams@intel.com, peterz@infradead.org, ak@linux.intel.com, kirill.shutemov@linux.intel.com, sathyanarayanan.kuppuswamy@linux.intel.com, isaku.yamahata@intel.com, kai.huang@intel.com Subject: [PATCH v5 22/22] Documentation/x86: Add documentation for TDX host support Date: Wed, 22 Jun 2022 23:17:50 +1200 Message-Id: <0712bc0b05a0c6c42437fba68f82d9268ab3113e.1655894131.git.kai.huang@intel.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Add documentation for TDX host kernel support. There is already one file Documentation/x86/tdx.rst containing documentation for TDX guest internals. Also reuse it for TDX host kernel support. Introduce a new level menu "TDX Guest Support" and move existing materials under it, and add a new menu for TDX host kernel support. Signed-off-by: Kai Huang --- Documentation/x86/tdx.rst | 190 +++++++++++++++++++++++++++++++++++--- 1 file changed, 179 insertions(+), 11 deletions(-) diff --git a/Documentation/x86/tdx.rst b/Documentation/x86/tdx.rst index b8fa4329e1a5..6c6b09ca6ba4 100644 --- a/Documentation/x86/tdx.rst +++ b/Documentation/x86/tdx.rst @@ -10,6 +10,174 @@ encrypting the guest memory. In TDX, a special module running in a special mode sits between the host and the guest and manages the guest/host separation. +TDX Host Kernel Support +======================= + +TDX introduces a new CPU mode called Secure Arbitration Mode (SEAM) and +a new isolated range pointed by the SEAM Ranger Register (SEAMRR). A +CPU-attested software module called 'the TDX module' runs inside the new +isolated range to provide the functionalities to manage and run protected +VMs. + +TDX also leverages Intel Multi-Key Total Memory Encryption (MKTME) to +provide crypto-protection to the VMs. TDX reserves part of MKTME KeyIDs +as TDX private KeyIDs, which are only accessible within the SEAM mode. +BIOS is responsible for partitioning legacy MKTME KeyIDs and TDX KeyIDs. + +To enable TDX, BIOS configures SEAMRR and TDX private KeyIDs consistently +across all CPU packages. TDX doesn't trust BIOS. The MCHECK verifies +all configurations from BIOS are correct and enables SEAMRR. + +After TDX is enabled in BIOS, the TDX module needs to be loaded into the +SEAMRR range and properly initialized, before it can be used to create +and run protected VMs. + +The TDX architecture doesn't require BIOS to load the TDX module, but +current kernel assumes it is loaded by BIOS (i.e. either directly or by +some UEFI shell tool) before booting to the kernel. Current kernel +detects TDX and initializes the TDX module. + +TDX boot-time detection +----------------------- + +Kernel detects TDX and the TDX private KeyIDs during kernel boot. User +can see below dmesg if TDX is enabled by BIOS: + +| [..] tdx: SEAMRR enabled. +| [..] tdx: TDX private KeyID range: [16, 64). +| [..] tdx: TDX enabled by BIOS. + +TDX module detection and initialization +--------------------------------------- + +There is no CPUID or MSR to detect whether the TDX module. The kernel +detects the TDX module by initializing it. + +The kernel talks to the TDX module via the new SEAMCALL instruction. The +TDX module implements SEAMCALL leaf functions to allow the kernel to +initialize it. + +Initializing the TDX module consumes roughly ~1/256th system RAM size to +use it as 'metadata' for the TDX memory. It also takes additional CPU +time to initialize those metadata along with the TDX module itself. Both +are not trivial. Current kernel doesn't choose to always initialize the +TDX module during kernel boot, but provides a function tdx_init() to +allow the caller to initialize TDX when it truly wants to use TDX: + + ret = tdx_init(); + if (ret) + goto no_tdx; + // TDX is ready to use + +Initializing the TDX module requires all logical CPUs being online and +are in VMX operation (requirement of making SEAMCALL) during tdx_init(). +Currently, KVM is the only user of TDX. KVM always guarantees all online +CPUs are in VMX operation when there's any VM. Current kernel doesn't +handle entering VMX operation in tdx_init() but leaves this to the +caller. + +User can consult dmesg to see the presence of the TDX module, and whether +it has been initialized. + +If the TDX module is not loaded, dmesg shows below: + +| [..] tdx: TDX module is not loaded. + +If the TDX module is initialized successfully, dmesg shows something +like below: + +| [..] tdx: TDX module: vendor_id 0x8086, major_version 1, minor_version 0, build_date 20211209, build_num 160 +| [..] tdx: 65667 pages allocated for PAMT. +| [..] tdx: TDX module initialized. + +If the TDX module failed to initialize, dmesg shows below: + +| [..] tdx: Failed to initialize TDX module. Shut it down. + +TDX Interaction to Other Kernel Components +------------------------------------------ + +CPU Hotplug +~~~~~~~~~~~ + +TDX doesn't work with ACPI CPU hotplug. To guarantee the security MCHECK +verifies all logical CPUs for all packages during platform boot. Any +hot-added CPU is not verified thus cannot support TDX. A non-buggy BIOS +should never deliver ACPI CPU hot-add event to the kernel. Such event is +reported as BIOS bug and the hot-added CPU is rejected. + +TDX requires all boot-time verified logical CPUs being present until +machine reset. If kernel receives ACPI CPU hot-removal event, assume the +kernel cannot continue to work normally so just BUG(). + +Note TDX works with CPU logical online/offline, thus the kernel still +allows to offline logical CPU and online it again. + +Memory Hotplug +~~~~~~~~~~~~~~ + +The TDX module reports a list of "Convertible Memory Region" (CMR) to +indicate which memory regions are TDX-capable. Those regions are +generated by BIOS and verified by the MCHECK so that they are truly +present during platform boot and can meet security guarantee. + +This means TDX doesn't work with ACPI memory hot-add. A non-buggy BIOS +should never deliver ACPI memory hot-add event to the kernel. Such event +is reported as BIOS bug and the hot-added memory is rejected. + +TDX also doesn't work with ACPI memory hot-removal. If kernel receives +ACPI memory hot-removal event, assume the kernel cannot continue to work +normally so just BUG(). + +Also, the kernel needs to choose which TDX-capable regions to use as TDX +memory and pass those regions to the TDX module when it gets initialized. +Once they are passed to the TDX module, the TDX-usable memory regions are +fixed during module's lifetime. + +To avoid having to modify the page allocator to distinguish TDX and +non-TDX memory allocation, current kernel guarantees all pages managed by +the page allocator are TDX memory. This means any hot-added memory to +the page allocator will break such guarantee thus should be prevented. + +There are basically two memory hot-add cases that need to be prevented: +ACPI memory hot-add and driver managed memory hot-add. The kernel +rejectes the driver managed memory hot-add too when TDX is enabled by +BIOS. For instance, dmesg shows below error when using kmem driver to +add a legacy PMEM as system RAM: + +| [..] tdx: Unable to add memory [0x580000000, 0x600000000) on TDX enabled platform. +| [..] kmem dax0.0: mapping0: 0x580000000-0x5ffffffff memory add failed + +However, adding new memory to ZONE_DEVICE should not be prevented as +those pages are not managed by the page allocator. Therefore, +memremap_pages() variants are still allowed although they internally +also uses memory hotplug functions. + +Kexec() +~~~~~~~ + +TDX (and MKTME) doesn't guarantee cache coherency among different KeyIDs. +If the TDX module is ever initialized, the kernel needs to flush dirty +cachelines associated with any TDX private KeyID, otherwise they may +slightly corrupt the new kernel. + +Similar to SME support, the kernel uses wbinvd() to flush cache in +stop_this_cpu(). + +The current TDX module architecture doesn't play nicely with kexec(). +The TDX module can only be initialized once during its lifetime, and +there is no SEAMCALL to reset the module to give a new clean slate to +the new kernel. Therefore, ideally, if the module is ever initialized, +it's better to shut down the module. The new kernel won't be able to +use TDX anyway (as it needs to go through the TDX module initialization +process which will fail immediately at the first step). + +However, there's no guarantee CPU is in VMX operation during kexec(), so +it's impractical to shut down the module. Current kernel just leaves the +module in open state. + +TDX Guest Support +================= Since the host cannot directly access guest registers or memory, much normal functionality of a hypervisor must be moved into the guest. This is implemented using a Virtualization Exception (#VE) that is handled by the @@ -20,7 +188,7 @@ TDX includes new hypercall-like mechanisms for communicating from the guest to the hypervisor or the TDX module. New TDX Exceptions -================== +------------------ TDX guests behave differently from bare-metal and traditional VMX guests. In TDX guests, otherwise normal instructions or memory accesses can cause @@ -30,7 +198,7 @@ Instructions marked with an '*' conditionally cause exceptions. The details for these instructions are discussed below. Instruction-based #VE ---------------------- +~~~~~~~~~~~~~~~~~~~~~ - Port I/O (INS, OUTS, IN, OUT) - HLT @@ -41,7 +209,7 @@ Instruction-based #VE - CPUID* Instruction-based #GP ---------------------- +~~~~~~~~~~~~~~~~~~~~~ - All VMX instructions: INVEPT, INVVPID, VMCLEAR, VMFUNC, VMLAUNCH, VMPTRLD, VMPTRST, VMREAD, VMRESUME, VMWRITE, VMXOFF, VMXON @@ -52,7 +220,7 @@ Instruction-based #GP - RDMSR*,WRMSR* RDMSR/WRMSR Behavior --------------------- +~~~~~~~~~~~~~~~~~~~~ MSR access behavior falls into three categories: @@ -73,7 +241,7 @@ trapping and handling in the TDX module. Other than possibly being slow, these MSRs appear to function just as they would on bare metal. CPUID Behavior --------------- +~~~~~~~~~~~~~~ For some CPUID leaves and sub-leaves, the virtualized bit fields of CPUID return values (in guest EAX/EBX/ECX/EDX) are configurable by the @@ -93,7 +261,7 @@ not know how to handle. The guest kernel may ask the hypervisor for the value with a hypercall. #VE on Memory Accesses -====================== +---------------------- There are essentially two classes of TDX memory: private and shared. Private memory receives full TDX protections. Its content is protected @@ -107,7 +275,7 @@ entries. This helps ensure that a guest does not place sensitive information in shared memory, exposing it to the untrusted hypervisor. #VE on Shared Memory --------------------- +~~~~~~~~~~~~~~~~~~~~ Access to shared mappings can cause a #VE. The hypervisor ultimately controls whether a shared memory access causes a #VE, so the guest must be @@ -127,7 +295,7 @@ be careful not to access device MMIO regions unless it is also prepared to handle a #VE. #VE on Private Pages --------------------- +~~~~~~~~~~~~~~~~~~~~ An access to private mappings can also cause a #VE. Since all kernel memory is also private memory, the kernel might theoretically need to @@ -145,7 +313,7 @@ The hypervisor is permitted to unilaterally move accepted pages to a to handle the exception. Linux #VE handler -================= +----------------- Just like page faults or #GP's, #VE exceptions can be either handled or be fatal. Typically, an unhandled userspace #VE results in a SIGSEGV. @@ -167,7 +335,7 @@ While the block is in place, any #VE is elevated to a double fault (#DF) which is not recoverable. MMIO handling -============= +------------- In non-TDX VMs, MMIO is usually implemented by giving a guest access to a mapping which will cause a VMEXIT on access, and then the hypervisor @@ -189,7 +357,7 @@ MMIO access via other means (like structure overlays) may result in an oops. Shared Memory Conversions -========================= +------------------------- All TDX guest memory starts out as private at boot. This memory can not be accessed by the hypervisor. However, some kernel users like device