From patchwork Mon Nov 21 00:26:32 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Kai" X-Patchwork-Id: 13050211 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 45121C43217 for ; Mon, 21 Nov 2022 00:27:41 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id BCD726B0083; Sun, 20 Nov 2022 19:27:40 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id B7DFA6B0085; Sun, 20 Nov 2022 19:27:40 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A1E4A8E0001; Sun, 20 Nov 2022 19:27:40 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 93DB06B0083 for ; Sun, 20 Nov 2022 19:27:40 -0500 (EST) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 701241A0821 for ; Mon, 21 Nov 2022 00:27:40 +0000 (UTC) X-FDA: 80155561080.30.5F57EC2 Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by imf22.hostedemail.com (Postfix) with ESMTP id BDF7CC0010 for ; Mon, 21 Nov 2022 00:27:39 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1668990459; x=1700526459; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=tTM6f9ZtDSECP7CTls1i410chojXO/BfWVbWBowQUQE=; b=comWZA4lBP0d+/g7nXFE5Bvv8ovJIy4sy8LpZZB3W8zdEFJfbqDKYDQf uGp6sFVWaQIvtEq2VzCp4CHIYohIZ7rQoiHxzv9qlAR6qdfjN1lBr9V2j EBUAiPW2zmmvrpzeDJKlkn1asSsKHrGFHCDCq9dlO8tH9Fi6O9E5ODE5K 2hTXb+fK/3maaVurpNuR1NsHLk87b6vOWCMoR3PGJQCvuX/7C3eNUAdqm hjflkDnoLPmAbuOTq9si/qidp6V42QUa+8MdXhlQyt3CFPqycLLuaQlGC 9xfNGjM6wpCGEov2p9DJUkW0NY5iyQ7p6w0qZ7bfN7pHOqqnSuuTQUqIf Q==; X-IronPort-AV: E=McAfee;i="6500,9779,10537"; a="296803696" X-IronPort-AV: E=Sophos;i="5.96,180,1665471600"; d="scan'208";a="296803696" Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Nov 2022 16:27:38 -0800 X-IronPort-AV: E=McAfee;i="6500,9779,10537"; a="729825362" X-IronPort-AV: E=Sophos;i="5.96,180,1665471600"; d="scan'208";a="729825362" Received: from tomnavar-mobl.amr.corp.intel.com (HELO khuang2-desk.gar.corp.intel.com) ([10.209.176.15]) by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Nov 2022 16:27:34 -0800 From: Kai Huang To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: linux-mm@kvack.org, seanjc@google.com, pbonzini@redhat.com, dave.hansen@intel.com, dan.j.williams@intel.com, rafael.j.wysocki@intel.com, kirill.shutemov@linux.intel.com, ying.huang@intel.com, reinette.chatre@intel.com, len.brown@intel.com, tony.luck@intel.com, peterz@infradead.org, ak@linux.intel.com, isaku.yamahata@intel.com, chao.gao@intel.com, sathyanarayanan.kuppuswamy@linux.intel.com, bagasdotme@gmail.com, sagis@google.com, imammedo@redhat.com, kai.huang@intel.com Subject: [PATCH v7 10/20] x86/virt/tdx: Use all system memory when initializing TDX module as TDX memory Date: Mon, 21 Nov 2022 13:26:32 +1300 Message-Id: <9b545148275b14a8c7edef1157f8ec44dc8116ee.1668988357.git.kai.huang@intel.com> X-Mailer: git-send-email 2.38.1 In-Reply-To: References: MIME-Version: 1.0 ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1668990460; a=rsa-sha256; cv=none; b=jVTgAZUyLPQAZ+uarQ3C5hwZ3vxOCjxx0uUVivpnzUkDeyVJ6PTD5ZMUM14WSNy8jp3Mj6 JtCoMMZTzeDn3znETSMOZ8DP0Y3CVGHWfO8KGhaWAw8ieiOp/C6BvqeI/Xc4qq3JnJiWgp B5cWUK9xUcgwVMxu2zncXQtuXoMHL+M= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=comWZA4l; spf=pass (imf22.hostedemail.com: domain of kai.huang@intel.com designates 134.134.136.126 as permitted sender) smtp.mailfrom=kai.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1668990460; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=gJ+5X3JEGP1sm7qRluYBRppMuKG810qRYdjZgW7jZGA=; b=YKc6OvRg39uGAF3mOTpqohKhKwmG6Xr3mc+QMFF7g6l45zhXJllodcxM3bdZgzT+zrotmi UTbXcDqpWYOtN30imnnWuBGFnAKmBhbRg3kuI1DVxZqpGC0VWquiWzWaQLo1YF0gZe7MO0 U8idoQKuAysydbZArZuc41yLushe694= X-Rspam-User: X-Stat-Signature: p13otfyddwppfbgykiukoukfjj9znj7p X-Rspamd-Queue-Id: BDF7CC0010 Authentication-Results: imf22.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=comWZA4l; spf=pass (imf22.hostedemail.com: domain of kai.huang@intel.com designates 134.134.136.126 as permitted sender) smtp.mailfrom=kai.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com X-Rspamd-Server: rspam07 X-HE-Tag: 1668990459-165325 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: TDX reports a list of "Convertible Memory Region" (CMR) to indicate all memory regions that can possibly be used by the TDX module, but they are not automatically usable to the TDX module. As a step of initializing the TDX module, the kernel needs to choose a list of memory regions (out from convertible memory regions) that the TDX module can use and pass those regions to the TDX module. Once this is done, those "TDX-usable" memory regions are fixed during module's lifetime. No more TDX-usable memory can be added to the TDX module after that. The initial support of TDX guests will only allocate TDX guest memory from the global page allocator. To keep things simple, this initial implementation simply guarantees all pages in the page allocator are TDX memory. To achieve this, use all system memory in the core-mm at the time of initializing the TDX module as TDX memory, and at the meantime, refuse to add any non-TDX-memory in the memory hotplug. Specifically, walk through all memory regions managed by memblock and add them to a global list of "TDX-usable" memory regions, which is a fixed list after the module initialization (or empty if initialization fails). To reject non-TDX-memory in memory hotplug, add an additional check in arch_add_memory() to check whether the new region is covered by any region in the "TDX-usable" memory region list. Note this requires all memory regions in memblock are TDX convertible memory when initializing the TDX module. This is true in practice if no new memory has been hot-added before initializing the TDX module, since in practice all boot-time present DIMM is TDX convertible memory. If any new memory has been hot-added, then initializing the TDX module will fail due to that memory region is not covered by CMR. This can be enhanced in the future, i.e. by allowing adding non-TDX memory to a separate NUMA node. In this case, the "TDX-capable" nodes and the "non-TDX-capable" nodes can co-exist, but the kernel/userspace needs to guarantee memory pages for TDX guests are always allocated from the "TDX-capable" nodes. Note TDX assumes convertible memory is always physically present during machine's runtime. A non-buggy BIOS should never support hot-removal of any convertible memory. This implementation doesn't handle ACPI memory removal but depends on the BIOS to behave correctly. Signed-off-by: Kai Huang --- v6 -> v7: - Changed to use all system memory in memblock at the time of initializing the TDX module as TDX memory - Added memory hotplug support --- arch/x86/Kconfig | 1 + arch/x86/include/asm/tdx.h | 3 + arch/x86/mm/init_64.c | 10 ++ arch/x86/virt/vmx/tdx/tdx.c | 183 ++++++++++++++++++++++++++++++++++++ 4 files changed, 197 insertions(+) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index dd333b46fafb..b36129183035 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -1959,6 +1959,7 @@ config INTEL_TDX_HOST depends on X86_64 depends on KVM_INTEL depends on X86_X2APIC + select ARCH_KEEP_MEMBLOCK help Intel Trust Domain Extensions (TDX) protects guest VMs from malicious host and certain physical attacks. This option enables necessary TDX diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h index d688228f3151..71169ecefabf 100644 --- a/arch/x86/include/asm/tdx.h +++ b/arch/x86/include/asm/tdx.h @@ -111,9 +111,12 @@ static inline long tdx_kvm_hypercall(unsigned int nr, unsigned long p1, #ifdef CONFIG_INTEL_TDX_HOST bool platform_tdx_enabled(void); int tdx_enable(void); +bool tdx_cc_memory_compatible(unsigned long start_pfn, unsigned long end_pfn); #else /* !CONFIG_INTEL_TDX_HOST */ static inline bool platform_tdx_enabled(void) { return false; } static inline int tdx_enable(void) { return -ENODEV; } +static inline bool tdx_cc_memory_compatible(unsigned long start_pfn, + unsigned long end_pfn) { return true; } #endif /* CONFIG_INTEL_TDX_HOST */ #endif /* !__ASSEMBLY__ */ diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c index 3f040c6e5d13..900341333d7e 100644 --- a/arch/x86/mm/init_64.c +++ b/arch/x86/mm/init_64.c @@ -55,6 +55,7 @@ #include #include #include +#include #include "mm_internal.h" @@ -968,6 +969,15 @@ int arch_add_memory(int nid, u64 start, u64 size, unsigned long start_pfn = start >> PAGE_SHIFT; unsigned long nr_pages = size >> PAGE_SHIFT; + /* + * For now if TDX is enabled, all pages in the page allocator + * must be TDX memory, which is a fixed set of memory regions + * that are passed to the TDX module. Reject the new region + * if it is not TDX memory to guarantee above is true. + */ + if (!tdx_cc_memory_compatible(start_pfn, start_pfn + nr_pages)) + return -EINVAL; + init_memory_mapping(start, start + size, params->pgprot); return add_pages(nid, start_pfn, nr_pages, params); diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c index 43227af25e44..32af86e31c47 100644 --- a/arch/x86/virt/vmx/tdx/tdx.c +++ b/arch/x86/virt/vmx/tdx/tdx.c @@ -16,6 +16,11 @@ #include #include #include +#include +#include +#include +#include +#include #include #include #include @@ -34,6 +39,13 @@ enum tdx_module_status_t { TDX_MODULE_SHUTDOWN, }; +struct tdx_memblock { + struct list_head list; + unsigned long start_pfn; + unsigned long end_pfn; + int nid; +}; + static u32 tdx_keyid_start __ro_after_init; static u32 tdx_keyid_num __ro_after_init; @@ -46,6 +58,9 @@ static struct tdsysinfo_struct tdx_sysinfo; static struct cmr_info tdx_cmr_array[MAX_CMRS] __aligned(CMR_INFO_ARRAY_ALIGNMENT); static int tdx_cmr_num; +/* All TDX-usable memory regions */ +static LIST_HEAD(tdx_memlist); + /* * Detect TDX private KeyIDs to see whether TDX has been enabled by the * BIOS. Both initializing the TDX module and running TDX guest require @@ -329,6 +344,107 @@ static int tdx_get_sysinfo(void) return trim_empty_cmrs(tdx_cmr_array, &tdx_cmr_num); } +/* Check whether the given pfn range is covered by any CMR or not. */ +static bool pfn_range_covered_by_cmr(unsigned long start_pfn, + unsigned long end_pfn) +{ + int i; + + for (i = 0; i < tdx_cmr_num; i++) { + struct cmr_info *cmr = &tdx_cmr_array[i]; + unsigned long cmr_start_pfn; + unsigned long cmr_end_pfn; + + cmr_start_pfn = cmr->base >> PAGE_SHIFT; + cmr_end_pfn = (cmr->base + cmr->size) >> PAGE_SHIFT; + + if (start_pfn >= cmr_start_pfn && end_pfn <= cmr_end_pfn) + return true; + } + + return false; +} + +/* + * Add a memory region on a given node as a TDX memory block. The caller + * to make sure all memory regions are added in address ascending order + * and don't overlap. + */ +static int add_tdx_memblock(unsigned long start_pfn, unsigned long end_pfn, + int nid) +{ + struct tdx_memblock *tmb; + + tmb = kmalloc(sizeof(*tmb), GFP_KERNEL); + if (!tmb) + return -ENOMEM; + + INIT_LIST_HEAD(&tmb->list); + tmb->start_pfn = start_pfn; + tmb->end_pfn = end_pfn; + tmb->nid = nid; + + list_add_tail(&tmb->list, &tdx_memlist); + return 0; +} + +static void free_tdx_memory(void) +{ + while (!list_empty(&tdx_memlist)) { + struct tdx_memblock *tmb = list_first_entry(&tdx_memlist, + struct tdx_memblock, list); + + list_del(&tmb->list); + kfree(tmb); + } +} + +/* + * Add all memblock memory regions to the @tdx_memlist as TDX memory. + * Must be called when get_online_mems() is called by the caller. + */ +static int build_tdx_memory(void) +{ + unsigned long start_pfn, end_pfn; + int i, nid, ret; + + for_each_mem_pfn_range(i, MAX_NUMNODES, &start_pfn, &end_pfn, &nid) { + /* + * The first 1MB may not be reported as TDX convertible + * memory. Manually exclude them as TDX memory. + * + * This is fine as the first 1MB is already reserved in + * reserve_real_mode() and won't end up to ZONE_DMA as + * free page anyway. + */ + start_pfn = max(start_pfn, (unsigned long)SZ_1M >> PAGE_SHIFT); + if (start_pfn >= end_pfn) + continue; + + /* Verify memory is truly TDX convertible memory */ + if (!pfn_range_covered_by_cmr(start_pfn, end_pfn)) { + pr_info("Memory region [0x%lx, 0x%lx) is not TDX convertible memorry.\n", + start_pfn << PAGE_SHIFT, + end_pfn << PAGE_SHIFT); + return -EINVAL; + } + + /* + * Add the memory regions as TDX memory. The regions in + * memblock has already guaranteed they are in address + * ascending order and don't overlap. + */ + ret = add_tdx_memblock(start_pfn, end_pfn, nid); + if (ret) + goto err; + } + + return 0; +err: + free_tdx_memory(); + return ret; +} + /* * Detect and initialize the TDX module. * @@ -357,12 +473,56 @@ static int init_tdx_module(void) if (ret) goto out; + /* + * All memory regions that can be used by the TDX module must be + * passed to the TDX module during the module initialization. + * Once this is done, all "TDX-usable" memory regions are fixed + * during module's runtime. + * + * The initial support of TDX guests only allocates memory from + * the global page allocator. To keep things simple, for now + * just make sure all pages in the page allocator are TDX memory. + * + * To achieve this, use all system memory in the core-mm at the + * time of initializing the TDX module as TDX memory, and at the + * meantime, reject any new memory in memory hot-add. + * + * This works as in practice, all boot-time present DIMM is TDX + * convertible memory. However if any new memory is hot-added + * before initializing the TDX module, the initialization will + * fail due to that memory is not covered by CMR. + * + * This can be enhanced in the future, i.e. by allowing adding or + * onlining non-TDX memory to a separate node, in which case the + * "TDX-capable" nodes and the "non-TDX-capable" nodes can exist + * together -- the userspace/kernel just needs to make sure pages + * for TDX guests must come from those "TDX-capable" nodes. + * + * Build the list of TDX memory regions as mentioned above so + * they can be passed to the TDX module later. + */ + get_online_mems(); + + ret = build_tdx_memory(); + if (ret) + goto out; /* * Return -EINVAL until all steps of TDX module initialization * process are done. */ ret = -EINVAL; out: + /* + * Memory hotplug checks the hot-added memory region against the + * @tdx_memlist to see if the region is TDX memory. + * + * Do put_online_mems() here to make sure any modification to + * @tdx_memlist is done while holding the memory hotplug read + * lock, so that the memory hotplug path can just check the + * @tdx_memlist w/o holding the @tdx_module_lock which may cause + * deadlock. + */ + put_online_mems(); return ret; } @@ -485,3 +645,26 @@ int tdx_enable(void) return ret; } EXPORT_SYMBOL_GPL(tdx_enable); + +/* + * Check whether the given range is TDX memory. Must be called between + * mem_hotplug_begin()/mem_hotplug_done(). + */ +bool tdx_cc_memory_compatible(unsigned long start_pfn, unsigned long end_pfn) +{ + struct tdx_memblock *tmb; + + /* Empty list means TDX isn't enabled successfully */ + if (list_empty(&tdx_memlist)) + return true; + + list_for_each_entry(tmb, &tdx_memlist, list) { + /* + * The new range is TDX memory if it is fully covered + * by any TDX memory block. + */ + if (start_pfn >= tmb->start_pfn && end_pfn <= tmb->end_pfn) + return true; + } + return false; +}