From patchwork Mon Jun 26 14:12:41 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Kai" X-Patchwork-Id: 13292980 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5C5A7EB64D7 for ; Mon, 26 Jun 2023 14:14:53 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 036148D0002; Mon, 26 Jun 2023 10:14:53 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id EFFEB8D0001; Mon, 26 Jun 2023 10:14:52 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D7AFB8D0002; Mon, 26 Jun 2023 10:14:52 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id C5B478D0001 for ; Mon, 26 Jun 2023 10:14:52 -0400 (EDT) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id BCA4B807F1 for ; Mon, 26 Jun 2023 14:14:51 +0000 (UTC) X-FDA: 80945095182.14.F542C2D Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by imf08.hostedemail.com (Postfix) with ESMTP id 72FED160019 for ; Mon, 26 Jun 2023 14:14:48 +0000 (UTC) Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=jPCR1pSL; spf=pass (imf08.hostedemail.com: domain of kai.huang@intel.com designates 134.134.136.126 as permitted sender) smtp.mailfrom=kai.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1687788889; a=rsa-sha256; cv=none; b=OYYQn4GST3667JF8SfEd2Up2i3VOtkfRshvhnDzMqG0fjwTCWHiVNQ5lsuIXHyIpDH43vC yLcPvECHzYbYpXMxThmVqyKtxaVzNxsfoLojq3+88MkrDxaThainhuGDl58WWNEoAPG9fr W/QwOHrm9fhwNVyU5opmiTRdW6j9M74= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=jPCR1pSL; spf=pass (imf08.hostedemail.com: domain of kai.huang@intel.com designates 134.134.136.126 as permitted sender) smtp.mailfrom=kai.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1687788889; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=z3lckIRJnp07fGVO6lmzB+O/JaqfglUKIjVVkPHDLJw=; b=D2D7OKPyurqYzR+NumJNqLK7VNKyutfBrr6wFnVw1HIYwe6DYXdipH5cTXQUD8eUFV8Oig weq7CpY1tMXaVe9SbMYwg3EjGCYIPED0gyy0shQ9hk8Q4PrNnbGnx4FnsHvP4bA4jhZJ2e SNV+WOYs3twJcXjDhCajlk1Ts4wnVuw= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1687788889; x=1719324889; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=euqo82fz3Bf0isL6tyk0IDxgpH21RqTQiuaH2+Qt2xg=; b=jPCR1pSLklhpmyi4g7I7alnJA3fe+98vN+LoaADRf2EMzGFT2HAmFu+/ UBFBgaU9KXbi4q/e3gwAt+CSNOYAgC9p4KVIcPjylyAPvrLZxWfnhDh99 3wq/4tm6iZtBXhj9gNVtFYqK7NI3WXMIZHZtScrKzU14qJyeXr9sNtyU1 59HGquKjHAjUhlbUOULJYD+OaKJRn1ITa9gRr5G/blaBuKl/lNg6qePcl IcWXGz/OMeZJ4cPvp83fQMnQRIjVDUOYHeXyPuAtP8SIf69cKWqZo5Ivx 81VEVMlPmFQz6GNHxGh8RDkvdF402j8AFG4XiSHIHns2xVLt++6aQIX3J Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10753"; a="346033837" X-IronPort-AV: E=Sophos;i="6.01,159,1684825200"; d="scan'208";a="346033837" Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 26 Jun 2023 07:14:47 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10753"; a="890292354" X-IronPort-AV: E=Sophos;i="6.01,159,1684825200"; d="scan'208";a="890292354" Received: from smithau-mobl1.amr.corp.intel.com (HELO khuang2-desk.gar.corp.intel.com) ([10.213.179.223]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 26 Jun 2023 07:14:40 -0700 From: Kai Huang To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: linux-mm@kvack.org, x86@kernel.org, dave.hansen@intel.com, kirill.shutemov@linux.intel.com, tony.luck@intel.com, peterz@infradead.org, tglx@linutronix.de, bp@alien8.de, mingo@redhat.com, hpa@zytor.com, seanjc@google.com, pbonzini@redhat.com, david@redhat.com, dan.j.williams@intel.com, rafael.j.wysocki@intel.com, ashok.raj@intel.com, reinette.chatre@intel.com, len.brown@intel.com, ak@linux.intel.com, isaku.yamahata@intel.com, ying.huang@intel.com, chao.gao@intel.com, sathyanarayanan.kuppuswamy@linux.intel.com, nik.borisov@suse.com, bagasdotme@gmail.com, sagis@google.com, imammedo@redhat.com, kai.huang@intel.com Subject: [PATCH v12 11/22] x86/virt/tdx: Fill out TDMRs to cover all TDX memory regions Date: Tue, 27 Jun 2023 02:12:41 +1200 Message-Id: X-Mailer: git-send-email 2.40.1 In-Reply-To: References: MIME-Version: 1.0 X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 72FED160019 X-Stat-Signature: 4nsg488kbinaej7xgs63sesowpu9911g X-Rspam-User: X-HE-Tag: 1687788888-437249 X-HE-Meta: U2FsdGVkX18tj8cgJTY4czSFq36AzfFOJOfT6M2yjjXZFAVy+tgxLwWGqjWXXz4SuAwxmdv9stKVWIMWg3ThnEAVLrfzAHhpztHwolOh0oXHxQ5F21lxuIInwLObKmqD+1nvZXIKhMDVh2tQ/kHH0Avbcfm6/xsUrLWuvS4mQZeE4bxK1LsYYpUJInqphv3PC5EYRvkTo9Kf5agGnMBvlqRFvi+h8qWVOscCNpdAf18uzXSJcxni1AWF/NsKTOWCgA+lFCYk39i4/KxDaYXekYZCoBDWiyBXH3hYnfQTjq+6MgFyDNgiVFi2JayDnFzUoZMFG3ERGgePqCrO5UJMBFKorqXaMH8NSuOIrHCJfu6plC1SJiKHHzdDIEEsG/a0iaiBAHeNF4dQZiVKc7JeX0zrOBr7Zwq2KlKVUlyNC1LLc/a5IUetuHDNI6LxP0FIcX+8EgmbpuJV42VLHw44ojhpar4pP2Wlz/wBDroy+2J+70ldYlsMq59uzqpiHaja29zmup6kDedQfr7RGt6ve8T+gW5bOxePmLZCCNzBQxc6WDeLHrfRAjghiFJhnKeUXWoLJslMXdc5BfGycR5IKDD8G40PNo/2ipE7ff44c9g3INx8CaQcLJ1HHRRn4DhDXeuOwUxdA/EJbjVemcayToL+oMAhfLSnNkIi3atbD+10douBUVAde09VvSLv/K9QI5Z0ZUstsdYq/L2kDhLn5cTmzgLEB610tHcApnNQxI2+9cxcxt9W6NynK7IyfJjXRU4iwTHk2XG0NmKQtQ4UphWnv2eKyymsZSKiE4LaLTGKDaUnXo1GZKqcrYXNYBaJbzklI/wtbUAEw0xcbx5NA4qUuruYyj5uj2cx29myN2N3AIY9wZv/um6PqTZowYYKsRcenencqKwOoepHl2uFpboKSQQBRqBh6uO+GLMSS1FExzfiDuCG1PzWVPX2GrQsCMpw7c4MV6LW/mGdIya TvCcrwra In6nrlDKhu+n1PaSRRwVP2mVrj3djHezhecY+RBYxVOHC3C5M341dAV42R3Vok2IO+DumYbbMwt7MyN/yik9kngAZ1rqNcG4K2i0jUNLxnZKLL0PYWyuHAny6ZqTkN6lp2bGjBs8V0AnmdO6tv/liwM9JnfN0xMmXYqy6 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Start to transit out the "multi-steps" to construct a list of "TD Memory Regions" (TDMRs) to cover all TDX-usable memory regions. The kernel configures TDX-usable memory regions by passing a list of TDMRs "TD Memory Regions" (TDMRs) to the TDX module. Each TDMR contains the information of the base/size of a memory region, the base/size of the associated Physical Address Metadata Table (PAMT) and a list of reserved areas in the region. Do the first step to fill out a number of TDMRs to cover all TDX memory regions. To keep it simple, always try to use one TDMR for each memory region. As the first step only set up the base/size for each TDMR. Each TDMR must be 1G aligned and the size must be in 1G granularity. This implies that one TDMR could cover multiple memory regions. If a memory region spans the 1GB boundary and the former part is already covered by the previous TDMR, just use a new TDMR for the remaining part. TDX only supports a limited number of TDMRs. Disable TDX if all TDMRs are consumed but there is more memory region to cover. There are fancier things that could be done like trying to merge adjacent TDMRs. This would allow more pathological memory layouts to be supported. But, current systems are not even close to exhausting the existing TDMR resources in practice. For now, keep it simple. Signed-off-by: Kai Huang Reviewed-by: Kirill A. Shutemov Reviewed-by: Kuppuswamy Sathyanarayanan Reviewed-by: Yuan Yao --- v11 -> v12: - Improved comments around looping over TDX memblock to create TDMRs. (Dave). - Added code to pr_warn() when consumed TDMRs reaching maximum TDMRs (Dave). - BIT_ULL(30) -> SZ_1G (Kirill) - Removed unused TDMR_PFN_ALIGNMENT (Sathy) - Added tags from Kirill/Sathy v10 -> v11: - No update v9 -> v10: - No change. v8 -> v9: - Added the last paragraph in the changelog (Dave). - Removed unnecessary type cast in tdmr_entry() (Dave). --- arch/x86/virt/vmx/tdx/tdx.c | 103 +++++++++++++++++++++++++++++++++++- arch/x86/virt/vmx/tdx/tdx.h | 3 ++ 2 files changed, 105 insertions(+), 1 deletion(-) diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c index e28615b60f9b..2ffc1517a93b 100644 --- a/arch/x86/virt/vmx/tdx/tdx.c +++ b/arch/x86/virt/vmx/tdx/tdx.c @@ -341,6 +341,102 @@ static void free_tdmr_list(struct tdmr_info_list *tdmr_list) tdmr_list->max_tdmrs * tdmr_list->tdmr_sz); } +/* Get the TDMR from the list at the given index. */ +static struct tdmr_info *tdmr_entry(struct tdmr_info_list *tdmr_list, + int idx) +{ + int tdmr_info_offset = tdmr_list->tdmr_sz * idx; + + return (void *)tdmr_list->tdmrs + tdmr_info_offset; +} + +#define TDMR_ALIGNMENT SZ_1G +#define TDMR_ALIGN_DOWN(_addr) ALIGN_DOWN((_addr), TDMR_ALIGNMENT) +#define TDMR_ALIGN_UP(_addr) ALIGN((_addr), TDMR_ALIGNMENT) + +static inline u64 tdmr_end(struct tdmr_info *tdmr) +{ + return tdmr->base + tdmr->size; +} + +/* + * Take the memory referenced in @tmb_list and populate the + * preallocated @tdmr_list, following all the special alignment + * and size rules for TDMR. + */ +static int fill_out_tdmrs(struct list_head *tmb_list, + struct tdmr_info_list *tdmr_list) +{ + struct tdx_memblock *tmb; + int tdmr_idx = 0; + + /* + * Loop over TDX memory regions and fill out TDMRs to cover them. + * To keep it simple, always try to use one TDMR to cover one + * memory region. + * + * In practice TDX supports at least 64 TDMRs. A 2-socket system + * typically only consumes less than 10 of those. This code is + * dumb and simple and may use more TMDRs than is strictly + * required. + */ + list_for_each_entry(tmb, tmb_list, list) { + struct tdmr_info *tdmr = tdmr_entry(tdmr_list, tdmr_idx); + u64 start, end; + + start = TDMR_ALIGN_DOWN(PFN_PHYS(tmb->start_pfn)); + end = TDMR_ALIGN_UP(PFN_PHYS(tmb->end_pfn)); + + /* + * A valid size indicates the current TDMR has already + * been filled out to cover the previous memory region(s). + */ + if (tdmr->size) { + /* + * Loop to the next if the current memory region + * has already been fully covered. + */ + if (end <= tdmr_end(tdmr)) + continue; + + /* Otherwise, skip the already covered part. */ + if (start < tdmr_end(tdmr)) + start = tdmr_end(tdmr); + + /* + * Create a new TDMR to cover the current memory + * region, or the remaining part of it. + */ + tdmr_idx++; + if (tdmr_idx >= tdmr_list->max_tdmrs) { + pr_warn("initialization failed: TDMRs exhausted.\n"); + return -ENOSPC; + } + + tdmr = tdmr_entry(tdmr_list, tdmr_idx); + } + + tdmr->base = start; + tdmr->size = end - start; + } + + /* @tdmr_idx is always the index of the last valid TDMR. */ + tdmr_list->nr_consumed_tdmrs = tdmr_idx + 1; + + /* + * Warn early that kernel is about to run out of TDMRs. + * + * This is an indication that TDMR allocation has to be + * reworked to be smarter to not run into an issue. + */ + if (tdmr_list->max_tdmrs - tdmr_list->nr_consumed_tdmrs < TDMR_NR_WARN) + pr_warn("consumed TDMRs reaching limit: %d used out of %d\n", + tdmr_list->nr_consumed_tdmrs, + tdmr_list->max_tdmrs); + + return 0; +} + /* * Construct a list of TDMRs on the preallocated space in @tdmr_list * to cover all TDX memory regions in @tmb_list based on the TDX module @@ -350,10 +446,15 @@ static int construct_tdmrs(struct list_head *tmb_list, struct tdmr_info_list *tdmr_list, struct tdsysinfo_struct *sysinfo) { + int ret; + + ret = fill_out_tdmrs(tmb_list, tdmr_list); + if (ret) + return ret; + /* * TODO: * - * - Fill out TDMRs to cover all TDX memory regions. * - Allocate and set up PAMTs for each TDMR. * - Designate reserved areas for each TDMR. * diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h index 193764afc602..3086f7ad0522 100644 --- a/arch/x86/virt/vmx/tdx/tdx.h +++ b/arch/x86/virt/vmx/tdx/tdx.h @@ -123,6 +123,9 @@ struct tdx_memblock { unsigned long end_pfn; }; +/* Warn if kernel has less than TDMR_NR_WARN TDMRs after allocation */ +#define TDMR_NR_WARN 4 + struct tdmr_info_list { void *tdmrs; /* Flexible array to hold 'tdmr_info's */ int nr_consumed_tdmrs; /* How many 'tdmr_info's are in use */