From patchwork Mon Mar 6 14:13:50 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Kai" X-Patchwork-Id: 13161220 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id F1DE7C678D4 for ; Mon, 6 Mar 2023 14:14:59 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 93188280002; Mon, 6 Mar 2023 09:14:59 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 8E08E6B0078; Mon, 6 Mar 2023 09:14:59 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 78384280002; Mon, 6 Mar 2023 09:14:59 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 6968E6B0072 for ; Mon, 6 Mar 2023 09:14:59 -0500 (EST) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 3910016050E for ; Mon, 6 Mar 2023 14:14:59 +0000 (UTC) X-FDA: 80538669918.13.C600D96 Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by imf05.hostedemail.com (Postfix) with ESMTP id BFC7C100010 for ; Mon, 6 Mar 2023 14:14:56 +0000 (UTC) Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=MTi9dUUd; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf05.hostedemail.com: domain of kai.huang@intel.com designates 134.134.136.24 as permitted sender) smtp.mailfrom=kai.huang@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1678112097; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ytZrO9pAD0TUKZGuXEs7p28gPyCpLCf54E+RHBDEQoQ=; b=P93IwVEQNOm686irZphVufHe+AbzmWQvPLwZKlxkkxE84igwUBRdbYDMl3clTtGMKQ8BsU YYex85K4Zy+Gn0s681DGP1v9LS/kMd7BtMqW4fcu8a50d+GykXFae7PiS83L/4jXtsu0ma Fzt90tgYcMPAh3zQFhcHA3j3zYElcHU= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=MTi9dUUd; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf05.hostedemail.com: domain of kai.huang@intel.com designates 134.134.136.24 as permitted sender) smtp.mailfrom=kai.huang@intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1678112097; a=rsa-sha256; cv=none; b=qKfea+JffYn6O5W1soBVGdQMORt31y4uDbICcm/zvg35qNgeAJMCd9TueXbPhQmbGwy1gG vjuBqAa7wq7EqnRibLdFJ82TYUdhAyTJPzrld2rOL7cX/wEpSB75tFp2ok2vq7qihLHKLb 8g0/v0UWBht5iGEaMv9Rw3HAHG/LTPY= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1678112096; x=1709648096; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=WsDUIOtagxcjdB7YYb5/nPYU+cZHxidSVv9+ymwBYjg=; b=MTi9dUUd0Bb3YWvqN9Frj7lglpllZwF1AeEkDDd59ueEzu5Q2hu+ZuJg qpSleP0LBTmg09QybHO4QnO/VuhAU9tVSPx2ydnD07MYVVzWvr6m+9fPk PvIPxTcXQOl7KeZ84vFIVSaMb00CGQ7cJHTWB/UUTwEfIUlNMogFozduT 0uCfLaqkr5wIouiBzgGFCGEl3pKuePe9X//4NESw2TADkFS45DMXqu9Ag Oj+AUYXFLk6dYlMJS1PQfl0Sr/n0d2EWVHJlIQEC30fnx9jnApamb8JOV Giief1DSvBJrFVSuNbRKZNCXwOpyjLy+mQkKgol519WRmA0+vq9CB/L3R w==; X-IronPort-AV: E=McAfee;i="6500,9779,10641"; a="337080092" X-IronPort-AV: E=Sophos;i="5.98,238,1673942400"; d="scan'208";a="337080092" Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 06 Mar 2023 06:14:56 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6500,9779,10641"; a="765232104" X-IronPort-AV: E=Sophos;i="5.98,238,1673942400"; d="scan'208";a="765232104" Received: from jwhisle1-mobl2.amr.corp.intel.com (HELO khuang2-desk.gar.corp.intel.com) ([10.212.92.57]) by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 06 Mar 2023 06:14:51 -0800 From: Kai Huang To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: linux-mm@kvack.org, dave.hansen@intel.com, peterz@infradead.org, tglx@linutronix.de, seanjc@google.com, pbonzini@redhat.com, dan.j.williams@intel.com, rafael.j.wysocki@intel.com, kirill.shutemov@linux.intel.com, ying.huang@intel.com, reinette.chatre@intel.com, len.brown@intel.com, tony.luck@intel.com, ak@linux.intel.com, isaku.yamahata@intel.com, chao.gao@intel.com, sathyanarayanan.kuppuswamy@linux.intel.com, david@redhat.com, bagasdotme@gmail.com, sagis@google.com, imammedo@redhat.com, kai.huang@intel.com Subject: [PATCH v10 05/16] x86/virt/tdx: Add skeleton to enable TDX on demand Date: Tue, 7 Mar 2023 03:13:50 +1300 Message-Id: X-Mailer: git-send-email 2.39.2 In-Reply-To: References: MIME-Version: 1.0 X-Rspamd-Queue-Id: BFC7C100010 X-Rspamd-Server: rspam09 X-Rspam-User: X-Stat-Signature: z9g9mztp348jatex78ynm44r9brgxiy7 X-HE-Tag: 1678112096-7905 X-HE-Meta: U2FsdGVkX1/IXtJfMYZsWRvlM5eo7siipJ/6KoeZS3Rw+dx0hx+FUl4Cv0StFgb5nOA9FKZznA4YO924eTWM+lEJiwxhCf2DeFHPvVuan3KiYQ69AbC9CUhzPd5KZzfEehqcVzcBuwtCAOaFnzbvroDUgelP8UL0HWHVxeI/4/2vJ7Blt8vY+VhNx+HkWq1iDqIyyA6LG51qWs5VwMa7fVdmWOLdCPNzu15voqoPVXjboQLGHbfswtKPY5OG0sbalHpP+c+rw9tWsRGjxjwaBB9I85CgO3f+iHA0vDKJ1DKbWifGE3yY9eUs0vRlgiWhbtlSOxISw6f/h4MkR+516PrR6gRRgeX+g7FOhyRq0wSxu6pj0a8kZVhKENAyOYYtr1mMEIk0h01ATySaAz8UqwY1lyZfg5HomjtDeoL8kdDBp/W5Bq8uoIopV8s65YUVwKXCN3QsxQ0YLIX/1u0DyeXW2kEp+Aqajg21Nm8+9mR/1TDp0BwabIAh6SrKIq8Qgwqp6HjzdKZQChWfFfpsfTcTPjl1/yoTL4oZXytF1/rU6nQmsG/jSQQeK4VTsjHPgnMdXZMyhELyCxYURjw7V1piDn1KOzWzg2SZepD1Kt570O3HJiImwnuOmSOSBIICAri+ACilqgpJqpIqg0MjXb14zIzdtMnv5g2amcKW4IEJeR2l+pHY1OaNL0ZRHiF8xbV7a1KEf4gaeDnfaJ6UD6gIvYLE3tzDw2hD0IMdgzWOXM94a7WG/gGXx4j8HN1byTIRkuGFC+/BHYeqiMEWtR2m/7Xl5qkB7tdiwNyyjBHnrZqaCFwyvzVW3U8lk/xPyLPEnYhV3Y6TdcEt5DouyK/ae4+clxuIlV5OICHFjVXQmc/nRhCVuOQ4yYiDWyHZ5/7CveDDCdEnLn+Vluq8UL4VjZ1VlIUguw0/BtWNE2qg1FHUW0Lj/xcWCO/oBU4Z0lSLdwKEp+vwKHcl40+ 52yito5q JA9Rfd9W8cY0tZ5NxN1Kz8/jzbnXqQiK76cEjKEBtTw5dOjHtCqePLATX/5oEME7m/XymeFBvtjdk9F7f9GZ72ATcC3n0T8apxx84gp9iyGjREhaCWyEuE+Dtc4okpwbwh9sDBGsRTaak32OccZf9TgLGe9KSngyvxGORsfkKTuAV8QE= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: To enable TDX the kernel needs to initialize TDX from two perspectives: 1) Do a set of SEAMCALLs to initialize the TDX module to make it ready to create and run TDX guests; 2) Do the per-cpu initialization SEAMCALL on one logical cpu before the kernel wants to make any other SEAMCALLs on that cpu (including those involved during module initialization and running TDX guests). The TDX module can be initialized only once in its lifetime. Instead of always initializing it at boot time, this implementation chooses an "on demand" approach to initialize TDX until there is a real need (e.g when requested by KVM). This approach has below pros: 1) It avoids consuming the memory that must be allocated by kernel and given to the TDX module as metadata (~1/256th of the TDX-usable memory), and also saves the CPU cycles of initializing the TDX module (and the metadata) when TDX is not used at all. 2) The TDX module design allows it to be updated while the system is running. The update procedure shares quite a few steps with this "on demand" initialization mechanism. The hope is that much of "on demand" mechanism can be shared with a future "update" mechanism. A boot-time TDX module implementation would not be able to share much code with the update mechanism. 3) Making SEAMCALL requires VMX to be enabled. Currently, only the KVM code mucks with VMX enabling. If the TDX module were to be initialized separately from KVM (like at boot), the boot code would need to be taught how to muck with VMX enabling and KVM would need to be taught how to cope with that. Making KVM itself responsible for TDX initialization lets the rest of the kernel stay blissfully unaware of VMX. Similar to module initialization, also make the per-cpu initialization "on demand" as it also depends on VMX to be enabled. Add two functions, tdx_enable() and tdx_cpu_enable(), to enable the TDX module and enable TDX on local cpu respectively. For now tdx_enable() is a placeholder. The TODO list will be pared down as functionality is added. In tdx_enable() use a state machine protected by mutex to make sure the initialization will only be done once, as tdx_enable() can be called multiple times (i.e. KVM module can be reloaded) and may be called concurrently by other kernel components in the future. The per-cpu initialization on each cpu can only be done once during the module's life time. Use a per-cpu variable to track its status to make sure it is only done once in tdx_cpu_enable(). Also, a SEAMCALL to do TDX module global initialization must be done once on any logical cpu before any per-cpu initialization SEAMCALL. Do it inside tdx_cpu_enable() too (if hasn't been done). tdx_enable() can potentially invoke SEAMCALLs on any online cpus. The per-cpu initialization must be done before those SEAMCALLs are invoked on some cpu. To keep things simple, in tdx_cpu_enable(), always do the per-cpu initialization regardless of whether the TDX module has been initialized or not. And in tdx_enable(), don't call tdx_cpu_enable() but assume the caller has disabled CPU hotplug and done VMXON and tdx_cpu_enable() on all online cpus before calling tdx_enable(). Signed-off-by: Kai Huang Reviewed-by: Isaku Yamahata Signed-off-by: Isaku Yamahata --- v9 -> v10: - Merged the patch to handle per-cpu initialization to this patch to tell the story better. - Changed how to handle the per-cpu initialization to only provide a tdx_cpu_enable() function to let the user of TDX to do it when the user wants to run TDX code on a certain cpu. - Changed tdx_enable() to not call cpus_read_lock() explicitly, but call lockdep_assert_cpus_held() to assume the caller has done that. - Improved comments around tdx_enable() and tdx_cpu_enable(). - Improved changelog to tell the story better accordingly. v8 -> v9: - Removed detailed TODO list in the changelog (Dave). - Added back steps to do module global initialization and per-cpu initialization in the TODO list comment. - Moved the 'enum tdx_module_status_t' from tdx.c to local tdx.h v7 -> v8: - Refined changelog (Dave). - Removed "all BIOS-enabled cpus" related code (Peter/Thomas/Dave). - Add a "TODO list" comment in init_tdx_module() to list all steps of initializing the TDX Module to tell the story (Dave). - Made tdx_enable() unverisally return -EINVAL, and removed nonsense comments (Dave). - Simplified __tdx_enable() to only handle success or failure. - TDX_MODULE_SHUTDOWN -> TDX_MODULE_ERROR - Removed TDX_MODULE_NONE (not loaded) as it is not necessary. - Improved comments (Dave). - Pointed out 'tdx_module_status' is software thing (Dave). v6 -> v7: - No change. v5 -> v6: - Added code to set status to TDX_MODULE_NONE if TDX module is not loaded (Chao) - Added Chao's Reviewed-by. - Improved comments around cpus_read_lock(). - v3->v5 (no feedback on v4): - Removed the check that SEAMRR and TDX KeyID have been detected on all present cpus. - Removed tdx_detect(). - Added num_online_cpus() to MADT-enabled CPUs check within the CPU hotplug lock and return early with error message. - Improved dmesg printing for TDX module detection and initialization. --- arch/x86/include/asm/tdx.h | 4 + arch/x86/virt/vmx/tdx/tdx.c | 182 ++++++++++++++++++++++++++++++++++++ arch/x86/virt/vmx/tdx/tdx.h | 25 +++++ 3 files changed, 211 insertions(+) diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h index b489b5b9de5d..112a5b9bd5cd 100644 --- a/arch/x86/include/asm/tdx.h +++ b/arch/x86/include/asm/tdx.h @@ -102,8 +102,12 @@ static inline long tdx_kvm_hypercall(unsigned int nr, unsigned long p1, #ifdef CONFIG_INTEL_TDX_HOST bool platform_tdx_enabled(void); +int tdx_cpu_enable(void); +int tdx_enable(void); #else /* !CONFIG_INTEL_TDX_HOST */ static inline bool platform_tdx_enabled(void) { return false; } +static inline int tdx_cpu_enable(void) { return -EINVAL; } +static inline int tdx_enable(void) { return -EINVAL; } #endif /* CONFIG_INTEL_TDX_HOST */ #endif /* !__ASSEMBLY__ */ diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c index b65b838f3b5d..29127cb70f51 100644 --- a/arch/x86/virt/vmx/tdx/tdx.c +++ b/arch/x86/virt/vmx/tdx/tdx.c @@ -13,6 +13,10 @@ #include #include #include +#include +#include +#include +#include #include #include #include @@ -22,6 +26,18 @@ static u32 tdx_global_keyid __ro_after_init; static u32 tdx_guest_keyid_start __ro_after_init; static u32 tdx_nr_guest_keyids __ro_after_init; +static unsigned int tdx_global_init_status; +static DEFINE_SPINLOCK(tdx_global_init_lock); +#define TDX_GLOBAL_INIT_DONE _BITUL(0) +#define TDX_GLOBAL_INIT_FAILED _BITUL(1) + +static DEFINE_PER_CPU(unsigned int, tdx_lp_init_status); +#define TDX_LP_INIT_DONE _BITUL(0) +#define TDX_LP_INIT_FAILED _BITUL(1) + +static enum tdx_module_status_t tdx_module_status; +static DEFINE_MUTEX(tdx_module_lock); + /* * Use tdx_global_keyid to indicate that TDX is uninitialized. * This is used in TDX initialization error paths to take it from @@ -159,3 +175,169 @@ static int __always_unused seamcall(u64 fn, u64 rcx, u64 rdx, u64 r8, u64 r9, put_cpu(); return ret; } + +static int try_init_module_global(void) +{ + int ret; + + /* + * The TDX module global initialization only needs to be done + * once on any cpu. + */ + spin_lock(&tdx_global_init_lock); + + if (tdx_global_init_status & TDX_GLOBAL_INIT_DONE) { + ret = tdx_global_init_status & TDX_GLOBAL_INIT_FAILED ? + -EINVAL : 0; + goto out; + } + + /* All '0's are just unused parameters. */ + ret = seamcall(TDH_SYS_INIT, 0, 0, 0, 0, NULL, NULL); + + tdx_global_init_status = TDX_GLOBAL_INIT_DONE; + if (ret) + tdx_global_init_status |= TDX_GLOBAL_INIT_FAILED; +out: + spin_unlock(&tdx_global_init_lock); + + return ret; +} + +/** + * tdx_cpu_enable - Enable TDX on local cpu + * + * Do one-time TDX module per-cpu initialization SEAMCALL (and TDX module + * global initialization SEAMCALL if not done) on local cpu to make this + * cpu be ready to run any other SEAMCALLs. + * + * Note this function must be called when preemption is not possible + * (i.e. via SMP call or in per-cpu thread). It is not IRQ safe either + * (i.e. cannot be called in per-cpu thread and via SMP call from remote + * cpu simultaneously). + * + * Return 0 on success, otherwise errors. + */ +int tdx_cpu_enable(void) +{ + unsigned int lp_status; + int ret; + + if (!platform_tdx_enabled()) + return -EINVAL; + + lp_status = __this_cpu_read(tdx_lp_init_status); + + /* Already done */ + if (lp_status & TDX_LP_INIT_DONE) + return lp_status & TDX_LP_INIT_FAILED ? -EINVAL : 0; + + /* + * The TDX module global initialization is the very first step + * to enable TDX. Need to do it first (if hasn't been done) + * before doing the per-cpu initialization. + */ + ret = try_init_module_global(); + + /* + * If the module global initialization failed, there's no point + * to do the per-cpu initialization. Just mark it as done but + * failed. + */ + if (ret) + goto update_status; + + /* All '0's are just unused parameters */ + ret = seamcall(TDH_SYS_LP_INIT, 0, 0, 0, 0, NULL, NULL); + +update_status: + lp_status = TDX_LP_INIT_DONE; + if (ret) + lp_status |= TDX_LP_INIT_FAILED; + + this_cpu_write(tdx_lp_init_status, lp_status); + + return ret; +} +EXPORT_SYMBOL_GPL(tdx_cpu_enable); + +static int init_tdx_module(void) +{ + /* + * TODO: + * + * - Get TDX module information and TDX-capable memory regions. + * - Build the list of TDX-usable memory regions. + * - Construct a list of "TD Memory Regions" (TDMRs) to cover + * all TDX-usable memory regions. + * - Configure the TDMRs and the global KeyID to the TDX module. + * - Configure the global KeyID on all packages. + * - Initialize all TDMRs. + * + * Return error before all steps are done. + */ + return -EINVAL; +} + +static int __tdx_enable(void) +{ + int ret; + + ret = init_tdx_module(); + if (ret) { + pr_err("TDX module initialization failed (%d)\n", ret); + tdx_module_status = TDX_MODULE_ERROR; + /* + * Just return one universal error code. + * For now the caller cannot recover anyway. + */ + return -EINVAL; + } + + pr_info("TDX module initialized.\n"); + tdx_module_status = TDX_MODULE_INITIALIZED; + + return 0; +} + +/** + * tdx_enable - Enable TDX module to make it ready to run TDX guests + * + * This function assumes the caller has: 1) held read lock of CPU hotplug + * lock to prevent any new cpu from becoming online; 2) done both VMXON + * and tdx_cpu_enable() on all online cpus. + * + * This function can be called in parallel by multiple callers. + * + * Return 0 if TDX is enabled successfully, otherwise error. + */ +int tdx_enable(void) +{ + int ret; + + if (!platform_tdx_enabled()) + return -EINVAL; + + lockdep_assert_cpus_held(); + + mutex_lock(&tdx_module_lock); + + switch (tdx_module_status) { + case TDX_MODULE_UNKNOWN: + ret = __tdx_enable(); + break; + case TDX_MODULE_INITIALIZED: + /* Already initialized, great, tell the caller. */ + ret = 0; + break; + default: + /* Failed to initialize in the previous attempts */ + ret = -EINVAL; + break; + } + + mutex_unlock(&tdx_module_lock); + + return ret; +} +EXPORT_SYMBOL_GPL(tdx_enable); diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h index 48ad1a1ba737..4d6220e86ccf 100644 --- a/arch/x86/virt/vmx/tdx/tdx.h +++ b/arch/x86/virt/vmx/tdx/tdx.h @@ -4,6 +4,31 @@ #include +/* + * This file contains both macros and data structures defined by the TDX + * architecture and Linux defined software data structures and functions. + * The two should not be mixed together for better readability. The + * architectural definitions come first. + */ + +/* + * TDX module SEAMCALL leaf functions + */ +#define TDH_SYS_INIT 33 +#define TDH_SYS_LP_INIT 35 + +/* + * Do not put any hardware-defined TDX structure representations below + * this comment! + */ + +/* Kernel defined TDX module status during module initialization. */ +enum tdx_module_status_t { + TDX_MODULE_UNKNOWN, + TDX_MODULE_INITIALIZED, + TDX_MODULE_ERROR +}; + struct tdx_module_output; u64 __seamcall(u64 fn, u64 rcx, u64 rdx, u64 r8, u64 r9, struct tdx_module_output *out);