Message ID | 927ec9871721d2a50f1aba7d1cf7c3be50e4f49b.1685887183.git.kai.huang@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | TDX host kernel support | expand |
On 6/4/23 07:27, Kai Huang wrote: > + /* > + * Loop over TDX memory regions and fill out TDMRs to cover them. > + * To keep it simple, always try to use one TDMR to cover one > + * memory region. > + * > + * In practice TDX1.0 supports 64 TDMRs, which is big enough to > + * cover all memory regions in reality if the admin doesn't use > + * 'memmap' to create a bunch of discrete memory regions. When > + * there's a real problem, enhancement can be done to merge TDMRs > + * to reduce the final number of TDMRs. > + */ Rather than focus in on one specific command-line parameter, let's just say: In practice TDX supports at least 64 TDMRs. A 2-socket system typically only consumes <NUMBER> of those. This code is dumb and simple and may use more TMDRs than is strictly required. Let's also put a pr_warn() in here if we exceed, say 1/2 or maybe 3/4 of the 64. We'll hopefully start to get reports somewhat in advance if systems get close to the limit.
On Wed, 2023-06-07 at 09:05 -0700, Dave Hansen wrote: > On 6/4/23 07:27, Kai Huang wrote: > > + /* > > + * Loop over TDX memory regions and fill out TDMRs to cover them. > > + * To keep it simple, always try to use one TDMR to cover one > > + * memory region. > > + * > > + * In practice TDX1.0 supports 64 TDMRs, which is big enough to > > + * cover all memory regions in reality if the admin doesn't use > > + * 'memmap' to create a bunch of discrete memory regions. When > > + * there's a real problem, enhancement can be done to merge TDMRs > > + * to reduce the final number of TDMRs. > > + */ > > Rather than focus in on one specific command-line parameter, let's just say: > > In practice TDX supports at least 64 TDMRs. A 2-socket system > typically only consumes <NUMBER> of those. This code is dumb > and simple and may use more TMDRs than is strictly required. Thanks will do. Will take a look at machine to get the <NUMBER>. > > Let's also put a pr_warn() in here if we exceed, say 1/2 or maybe 3/4 of > the 64. We'll hopefully start to get reports somewhat in advance if > systems get close to the limit. May I ask why this is useful? TDX module can only be initialized once, so if not considering module runtime update case, the kernel can only get two results for once: 1) Succeed to initialize: consumed TDMRs doesn't exceed maximum TDMRs 2) Fail to initialize: consumed TDMRs exceeds maximum TDMRs What's the value of pr_warn() user when consumed TDMRs exceeds some threshold? Anyway, if you want it, how does below code look? static int fill_out_tdmrs(struct list_head *tmb_list, struct tdmr_info_list *tdmr_list) { + int consumed_tdmrs_threshold, tdmr_idx = 0; struct tdx_memblock *tmb; - int tdmr_idx = 0; /* * Loop over TDX memory regions and fill out TDMRs to cover them. * To keep it simple, always try to use one TDMR to cover one * memory region. * - * In practice TDX1.0 supports 64 TDMRs, which is big enough to - * cover all memory regions in reality if the admin doesn't use - * 'memmap' to create a bunch of discrete memory regions. When - * there's a real problem, enhancement can be done to merge TDMRs - * to reduce the final number of TDMRs. + * In practice TDX supports at least 64 TDMRs. A 2-socket system + * typically only consumes <NUMBER> of those. This code is dumb + * and simple and may use more TMDRs than is strictly required. + * + * Also set a threshold of consumed TDMRs, and pr_warn() to warn + * the user the system is getting close to the limit of supported + * number of TDMRs if the number of consumed TDMRs exceeds the + * threshold. */ + consumed_tdmrs_threshold = tdmr_list->max_tdmrs * 3 / 4; list_for_each_entry(tmb, tmb_list, list) { struct tdmr_info *tdmr = tdmr_entry(tdmr_list, tdmr_idx); u64 start, end; @@ -463,6 +467,10 @@ static int fill_out_tdmrs(struct list_head *tmb_list, return -ENOSPC; } + if (tdmr_idx == consumed_tdmrs_threshold) + pr_warn("consumed TDMRs reaching limit: %d used (out of %d)\n", + tdmr_idx, tdmr_list->max_tdmrs); + tdmr = tdmr_entry(tdmr_list, tdmr_idx); }
On 6/8/23 03:48, Huang, Kai wrote: >> Let's also put a pr_warn() in here if we exceed, say 1/2 or maybe 3/4 of >> the 64. We'll hopefully start to get reports somewhat in advance if >> systems get close to the limit. > May I ask why this is useful? TDX module can only be initialized once, so if > not considering module runtime update case, the kernel can only get two results > for once: > > 1) Succeed to initialize: consumed TDMRs doesn't exceed maximum TDMRs > 2) Fail to initialize: consumed TDMRs exceeds maximum TDMRs > > What's the value of pr_warn() user when consumed TDMRs exceeds some threshold? Today, we're saying, "64 TMDRs out to be enough for anybody!" I'd actually kinda like to know if anybody starts building platforms that get anywhere near using 64. That way, we won't get a bug report that TDX is broken and we'll have a fire drill. We'll get a bug report that TDX is complaining and we'll have some time to go fix it without anyone actually being broken. Maybe not even a pr_warn(), but something that's a bit ominous and has a chance of getting users to act.
On Mon, Jun 05, 2023 at 02:27:24AM +1200, Kai Huang wrote: > +#define TDMR_ALIGNMENT BIT_ULL(30) Nit: SZ_1G can be a little bit more readable here. Anyway: Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
On 6/4/23 7:27 AM, Kai Huang wrote: > Start to transit out the "multi-steps" to construct a list of "TD Memory > Regions" (TDMRs) to cover all TDX-usable memory regions. > > The kernel configures TDX-usable memory regions by passing a list of > TDMRs "TD Memory Regions" (TDMRs) to the TDX module. Each TDMR contains > the information of the base/size of a memory region, the base/size of the > associated Physical Address Metadata Table (PAMT) and a list of reserved > areas in the region. > > Do the first step to fill out a number of TDMRs to cover all TDX memory > regions. To keep it simple, always try to use one TDMR for each memory > region. As the first step only set up the base/size for each TDMR. As a first step? > > Each TDMR must be 1G aligned and the size must be in 1G granularity. > This implies that one TDMR could cover multiple memory regions. If a > memory region spans the 1GB boundary and the former part is already > covered by the previous TDMR, just use a new TDMR for the remaining > part. > > TDX only supports a limited number of TDMRs. Disable TDX if all TDMRs > are consumed but there is more memory region to cover. > > There are fancier things that could be done like trying to merge > adjacent TDMRs. This would allow more pathological memory layouts to be > supported. But, current systems are not even close to exhausting the > existing TDMR resources in practice. For now, keep it simple. > > Signed-off-by: Kai Huang <kai.huang@intel.com> > --- > > v10 -> v11: > - No update > > v9 -> v10: > - No change. > > v8 -> v9: > > - Added the last paragraph in the changelog (Dave). > - Removed unnecessary type cast in tdmr_entry() (Dave). > > > --- > arch/x86/virt/vmx/tdx/tdx.c | 94 ++++++++++++++++++++++++++++++++++++- > 1 file changed, 93 insertions(+), 1 deletion(-) > > diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c > index 7a20c72361e7..fa9fa8bc581a 100644 > --- a/arch/x86/virt/vmx/tdx/tdx.c > +++ b/arch/x86/virt/vmx/tdx/tdx.c > @@ -385,6 +385,93 @@ static void free_tdmr_list(struct tdmr_info_list *tdmr_list) > tdmr_list->max_tdmrs * tdmr_list->tdmr_sz); > } > > +/* Get the TDMR from the list at the given index. */ > +static struct tdmr_info *tdmr_entry(struct tdmr_info_list *tdmr_list, > + int idx) > +{ > + int tdmr_info_offset = tdmr_list->tdmr_sz * idx; > + > + return (void *)tdmr_list->tdmrs + tdmr_info_offset; > +} > + > +#define TDMR_ALIGNMENT BIT_ULL(30) > +#define TDMR_PFN_ALIGNMENT (TDMR_ALIGNMENT >> PAGE_SHIFT) This macro is never used. Maybe you can drop it from this patch. > +#define TDMR_ALIGN_DOWN(_addr) ALIGN_DOWN((_addr), TDMR_ALIGNMENT) > +#define TDMR_ALIGN_UP(_addr) ALIGN((_addr), TDMR_ALIGNMENT) > + > +static inline u64 tdmr_end(struct tdmr_info *tdmr) > +{ > + return tdmr->base + tdmr->size; > +} > + > +/* > + * Take the memory referenced in @tmb_list and populate the > + * preallocated @tdmr_list, following all the special alignment > + * and size rules for TDMR. > + */ > +static int fill_out_tdmrs(struct list_head *tmb_list, > + struct tdmr_info_list *tdmr_list) > +{ > + struct tdx_memblock *tmb; > + int tdmr_idx = 0; > + > + /* > + * Loop over TDX memory regions and fill out TDMRs to cover them. > + * To keep it simple, always try to use one TDMR to cover one > + * memory region. > + * > + * In practice TDX1.0 supports 64 TDMRs, which is big enough to > + * cover all memory regions in reality if the admin doesn't use > + * 'memmap' to create a bunch of discrete memory regions. When > + * there's a real problem, enhancement can be done to merge TDMRs > + * to reduce the final number of TDMRs. > + */ > + list_for_each_entry(tmb, tmb_list, list) { > + struct tdmr_info *tdmr = tdmr_entry(tdmr_list, tdmr_idx); > + u64 start, end; > + > + start = TDMR_ALIGN_DOWN(PFN_PHYS(tmb->start_pfn)); > + end = TDMR_ALIGN_UP(PFN_PHYS(tmb->end_pfn)); > + > + /* > + * A valid size indicates the current TDMR has already > + * been filled out to cover the previous memory region(s). > + */ > + if (tdmr->size) { > + /* > + * Loop to the next if the current memory region > + * has already been fully covered. > + */ > + if (end <= tdmr_end(tdmr)) > + continue; > + > + /* Otherwise, skip the already covered part. */ > + if (start < tdmr_end(tdmr)) > + start = tdmr_end(tdmr); > + > + /* > + * Create a new TDMR to cover the current memory > + * region, or the remaining part of it. > + */ > + tdmr_idx++; > + if (tdmr_idx >= tdmr_list->max_tdmrs) { > + pr_warn("initialization failed: TDMRs exhausted.\n"); > + return -ENOSPC; > + } > + > + tdmr = tdmr_entry(tdmr_list, tdmr_idx); > + } > + > + tdmr->base = start; > + tdmr->size = end - start; > + } > + > + /* @tdmr_idx is always the index of last valid TDMR. */ > + tdmr_list->nr_consumed_tdmrs = tdmr_idx + 1; > + > + return 0; > +} > + > /* > * Construct a list of TDMRs on the preallocated space in @tdmr_list > * to cover all TDX memory regions in @tmb_list based on the TDX module > @@ -394,10 +481,15 @@ static int construct_tdmrs(struct list_head *tmb_list, > struct tdmr_info_list *tdmr_list, > struct tdsysinfo_struct *sysinfo) > { > + int ret; > + > + ret = fill_out_tdmrs(tmb_list, tdmr_list); > + if (ret) > + return ret; > + > /* > * TODO: > * > - * - Fill out TDMRs to cover all TDX memory regions. > * - Allocate and set up PAMTs for each TDMR. > * - Designate reserved areas for each TDMR. > * Rest looks good to me. Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
On Fri, 2023-06-09 at 02:02 +0300, kirill.shutemov@linux.intel.com wrote: > On Mon, Jun 05, 2023 at 02:27:24AM +1200, Kai Huang wrote: > > +#define TDMR_ALIGNMENT BIT_ULL(30) > > Nit: SZ_1G can be a little bit more readable here. Will do. > > Anyway: > > Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> > > Thanks.
On Thu, 2023-06-08 at 21:01 -0700, Sathyanarayanan Kuppuswamy wrote: > > On 6/4/23 7:27 AM, Kai Huang wrote: > > Start to transit out the "multi-steps" to construct a list of "TD Memory > > Regions" (TDMRs) to cover all TDX-usable memory regions. > > > > The kernel configures TDX-usable memory regions by passing a list of > > TDMRs "TD Memory Regions" (TDMRs) to the TDX module. Each TDMR contains > > the information of the base/size of a memory region, the base/size of the > > associated Physical Address Metadata Table (PAMT) and a list of reserved > > areas in the region. > > > > Do the first step to fill out a number of TDMRs to cover all TDX memory > > regions. To keep it simple, always try to use one TDMR for each memory > > region. As the first step only set up the base/size for each TDMR. > > As a first step? Not sure there are two or more first steps? I think I'll keep it as is. [...] > > +#define TDMR_ALIGNMENT BIT_ULL(30) > > +#define TDMR_PFN_ALIGNMENT (TDMR_ALIGNMENT >> PAGE_SHIFT) > > This macro is never used. Maybe you can drop it from this patch. OK will do. [...] > > Rest looks good to me. > > Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> > Thanks.
> > Maybe not even a pr_warn(), but something that's a bit ominous and has a > chance of getting users to act. Sorry I am not sure how to do. Could you give some suggestion?
On Mon, Jun 12, 2023 at 02:33:58AM +0000, Huang, Kai wrote: > > > > > Maybe not even a pr_warn(), but something that's a bit ominous and has a > > chance of getting users to act. > > Sorry I am not sure how to do. Could you give some suggestion? Maybe something like this would do? I'm struggle with the warning message. Any suggestion is welcome. diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c index 9cd4f6b58d4a..cc141025b249 100644 --- a/arch/x86/virt/vmx/tdx/tdx.c +++ b/arch/x86/virt/vmx/tdx/tdx.c @@ -627,6 +627,15 @@ static int fill_out_tdmrs(struct list_head *tmb_list, /* @tdmr_idx is always the index of last valid TDMR. */ tdmr_list->nr_consumed_tdmrs = tdmr_idx + 1; + /* + * Warn early that kernel is about to run out of TDMRs. + * + * This is indication that TDMR allocation has to be reworked to be + * smarter to not run into an issue. + */ + if (tdmr_list->max_tdmrs - tdmr_list->nr_consumed_tdmrs < TDMR_NR_WARN) + pr_warn("Low number of spare TDMRs\n"); + return 0; } diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h index 323ce744b853..17efe33847ae 100644 --- a/arch/x86/virt/vmx/tdx/tdx.h +++ b/arch/x86/virt/vmx/tdx/tdx.h @@ -98,6 +98,9 @@ struct tdx_memblock { int nid; }; +/* Warn if kernel has less than TDMR_NR_WARN TDMRs after allocation */ +#define TDMR_NR_WARN 4 + struct tdmr_info_list { void *tdmrs; /* Flexible array to hold 'tdmr_info's */ int nr_consumed_tdmrs; /* How many 'tdmr_info's are in use */
On Mon, 2023-06-12 at 17:33 +0300, kirill.shutemov@linux.intel.com wrote: > On Mon, Jun 12, 2023 at 02:33:58AM +0000, Huang, Kai wrote: > > > > > > > > Maybe not even a pr_warn(), but something that's a bit ominous and has a > > > chance of getting users to act. > > > > Sorry I am not sure how to do. Could you give some suggestion? > > Maybe something like this would do? > > I'm struggle with the warning message. Any suggestion is welcome. I guess it would be helpful to print out the actual consumed TDMRs? pr_warn("consumed TDMRs reaching limit: %d used (out of %d)\n", tdmr_idx, tdmr_list->max_tdmrs); > > diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c > index 9cd4f6b58d4a..cc141025b249 100644 > --- a/arch/x86/virt/vmx/tdx/tdx.c > +++ b/arch/x86/virt/vmx/tdx/tdx.c > @@ -627,6 +627,15 @@ static int fill_out_tdmrs(struct list_head *tmb_list, > /* @tdmr_idx is always the index of last valid TDMR. */ > tdmr_list->nr_consumed_tdmrs = tdmr_idx + 1; > > + /* > + * Warn early that kernel is about to run out of TDMRs. > + * > + * This is indication that TDMR allocation has to be reworked to be > + * smarter to not run into an issue. > + */ > + if (tdmr_list->max_tdmrs - tdmr_list->nr_consumed_tdmrs < TDMR_NR_WARN) > + pr_warn("Low number of spare TDMRs\n"); > + > return 0; > } > > diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h > index 323ce744b853..17efe33847ae 100644 > --- a/arch/x86/virt/vmx/tdx/tdx.h > +++ b/arch/x86/virt/vmx/tdx/tdx.h > @@ -98,6 +98,9 @@ struct tdx_memblock { > int nid; > }; > > +/* Warn if kernel has less than TDMR_NR_WARN TDMRs after allocation */ > +#define TDMR_NR_WARN 4 > + > struct tdmr_info_list { > void *tdmrs; /* Flexible array to hold 'tdmr_info's */ > int nr_consumed_tdmrs; /* How many 'tdmr_info's are in use */
On Mon, Jun 12, 2023 at 10:10:39PM +0000, Huang, Kai wrote: > On Mon, 2023-06-12 at 17:33 +0300, kirill.shutemov@linux.intel.com wrote: > > On Mon, Jun 12, 2023 at 02:33:58AM +0000, Huang, Kai wrote: > > > > > > > > > > > Maybe not even a pr_warn(), but something that's a bit ominous and has a > > > > chance of getting users to act. > > > > > > Sorry I am not sure how to do. Could you give some suggestion? > > > > Maybe something like this would do? > > > > I'm struggle with the warning message. Any suggestion is welcome. > > I guess it would be helpful to print out the actual consumed TDMRs? > > pr_warn("consumed TDMRs reaching limit: %d used (out of %d)\n", > tdmr_idx, tdmr_list->max_tdmrs); It is off-by-one. It supposed to be tdmr_idx + 1.
On Tue, 2023-06-13 at 13:18 +0300, kirill.shutemov@linux.intel.com wrote: > On Mon, Jun 12, 2023 at 10:10:39PM +0000, Huang, Kai wrote: > > On Mon, 2023-06-12 at 17:33 +0300, kirill.shutemov@linux.intel.com wrote: > > > On Mon, Jun 12, 2023 at 02:33:58AM +0000, Huang, Kai wrote: > > > > > > > > > > > > > > Maybe not even a pr_warn(), but something that's a bit ominous and has a > > > > > chance of getting users to act. > > > > > > > > Sorry I am not sure how to do. Could you give some suggestion? > > > > > > Maybe something like this would do? > > > > > > I'm struggle with the warning message. Any suggestion is welcome. > > > > I guess it would be helpful to print out the actual consumed TDMRs? > > > > pr_warn("consumed TDMRs reaching limit: %d used (out of %d)\n", > > tdmr_idx, tdmr_list->max_tdmrs); > > It is off-by-one. It supposed to be tdmr_idx + 1. > In your code, yes. Thanks for pointing out. I copied it from my code.
On 4.06.23 г. 17:27 ч., Kai Huang wrote: <snip> > --- > arch/x86/virt/vmx/tdx/tdx.c | 94 ++++++++++++++++++++++++++++++++++++- > 1 file changed, 93 insertions(+), 1 deletion(-) > > diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c > index 7a20c72361e7..fa9fa8bc581a 100644 > --- a/arch/x86/virt/vmx/tdx/tdx.c > +++ b/arch/x86/virt/vmx/tdx/tdx.c > @@ -385,6 +385,93 @@ static void free_tdmr_list(struct tdmr_info_list *tdmr_list) > tdmr_list->max_tdmrs * tdmr_list->tdmr_sz); > } > > +/* Get the TDMR from the list at the given index. */ > +static struct tdmr_info *tdmr_entry(struct tdmr_info_list *tdmr_list, > + int idx) > +{ > + int tdmr_info_offset = tdmr_list->tdmr_sz * idx; > + > + return (void *)tdmr_list->tdmrs + tdmr_info_offset; nit: I would just like to point that sizeof(void *) being treated as 1 is a gcc-specific compiler extension: https://gcc.gnu.org/onlinedocs/gcc-4.4.2/gcc/Pointer-Arith.html#Pointer-Arith I don't know if clang treats it the same way, just for the sake of simplicity you might wanna change this (void *) to (char *). <snip>
On Wed, 2023-06-14 at 15:31 +0300, Nikolay Borisov wrote: > > On 4.06.23 г. 17:27 ч., Kai Huang wrote: > <snip> > > > --- > > arch/x86/virt/vmx/tdx/tdx.c | 94 ++++++++++++++++++++++++++++++++++++- > > 1 file changed, 93 insertions(+), 1 deletion(-) > > > > diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c > > index 7a20c72361e7..fa9fa8bc581a 100644 > > --- a/arch/x86/virt/vmx/tdx/tdx.c > > +++ b/arch/x86/virt/vmx/tdx/tdx.c > > @@ -385,6 +385,93 @@ static void free_tdmr_list(struct tdmr_info_list *tdmr_list) > > tdmr_list->max_tdmrs * tdmr_list->tdmr_sz); > > } > > > > +/* Get the TDMR from the list at the given index. */ > > +static struct tdmr_info *tdmr_entry(struct tdmr_info_list *tdmr_list, > > + int idx) > > +{ > > + int tdmr_info_offset = tdmr_list->tdmr_sz * idx; > > + > > + return (void *)tdmr_list->tdmrs + tdmr_info_offset; > > nit: I would just like to point that sizeof(void *) being treated as 1 > is a gcc-specific compiler extension: > https://gcc.gnu.org/onlinedocs/gcc-4.4.2/gcc/Pointer-Arith.html#Pointer-Arith > > I don't know if clang treats it the same way, just for the sake of > simplicity you might wanna change this (void *) to (char *). Then we will need additional cast from 'char *' to 'struct tdmr_info *' I suppose? Not sure whether it is worth the additional cast. And I found such 'void *' arithmetic operation is already used in other kernel code too, e.g., below code in networking code: ./net/rds/tcp_send.c:105: (void *)&rm->m_inc.i_hdr + hdr_off, and I believe there are other examples too (that I didn't spend a lot of time to grep). And it seems Linus also thinks "using arithmetic on 'void *' is generally superior": https://lore.kernel.org/lkml/CAHk-=whFKYMrF6euVvziW+drw7-yi1pYdf=uccnzJ8k09DoTXA@mail.gmail.com/t/#m983827708903c8c5bddf193343d392c9ed5af1a0 So I wouldn't worry about the Clang thing.
diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c index 7a20c72361e7..fa9fa8bc581a 100644 --- a/arch/x86/virt/vmx/tdx/tdx.c +++ b/arch/x86/virt/vmx/tdx/tdx.c @@ -385,6 +385,93 @@ static void free_tdmr_list(struct tdmr_info_list *tdmr_list) tdmr_list->max_tdmrs * tdmr_list->tdmr_sz); } +/* Get the TDMR from the list at the given index. */ +static struct tdmr_info *tdmr_entry(struct tdmr_info_list *tdmr_list, + int idx) +{ + int tdmr_info_offset = tdmr_list->tdmr_sz * idx; + + return (void *)tdmr_list->tdmrs + tdmr_info_offset; +} + +#define TDMR_ALIGNMENT BIT_ULL(30) +#define TDMR_PFN_ALIGNMENT (TDMR_ALIGNMENT >> PAGE_SHIFT) +#define TDMR_ALIGN_DOWN(_addr) ALIGN_DOWN((_addr), TDMR_ALIGNMENT) +#define TDMR_ALIGN_UP(_addr) ALIGN((_addr), TDMR_ALIGNMENT) + +static inline u64 tdmr_end(struct tdmr_info *tdmr) +{ + return tdmr->base + tdmr->size; +} + +/* + * Take the memory referenced in @tmb_list and populate the + * preallocated @tdmr_list, following all the special alignment + * and size rules for TDMR. + */ +static int fill_out_tdmrs(struct list_head *tmb_list, + struct tdmr_info_list *tdmr_list) +{ + struct tdx_memblock *tmb; + int tdmr_idx = 0; + + /* + * Loop over TDX memory regions and fill out TDMRs to cover them. + * To keep it simple, always try to use one TDMR to cover one + * memory region. + * + * In practice TDX1.0 supports 64 TDMRs, which is big enough to + * cover all memory regions in reality if the admin doesn't use + * 'memmap' to create a bunch of discrete memory regions. When + * there's a real problem, enhancement can be done to merge TDMRs + * to reduce the final number of TDMRs. + */ + list_for_each_entry(tmb, tmb_list, list) { + struct tdmr_info *tdmr = tdmr_entry(tdmr_list, tdmr_idx); + u64 start, end; + + start = TDMR_ALIGN_DOWN(PFN_PHYS(tmb->start_pfn)); + end = TDMR_ALIGN_UP(PFN_PHYS(tmb->end_pfn)); + + /* + * A valid size indicates the current TDMR has already + * been filled out to cover the previous memory region(s). + */ + if (tdmr->size) { + /* + * Loop to the next if the current memory region + * has already been fully covered. + */ + if (end <= tdmr_end(tdmr)) + continue; + + /* Otherwise, skip the already covered part. */ + if (start < tdmr_end(tdmr)) + start = tdmr_end(tdmr); + + /* + * Create a new TDMR to cover the current memory + * region, or the remaining part of it. + */ + tdmr_idx++; + if (tdmr_idx >= tdmr_list->max_tdmrs) { + pr_warn("initialization failed: TDMRs exhausted.\n"); + return -ENOSPC; + } + + tdmr = tdmr_entry(tdmr_list, tdmr_idx); + } + + tdmr->base = start; + tdmr->size = end - start; + } + + /* @tdmr_idx is always the index of last valid TDMR. */ + tdmr_list->nr_consumed_tdmrs = tdmr_idx + 1; + + return 0; +} + /* * Construct a list of TDMRs on the preallocated space in @tdmr_list * to cover all TDX memory regions in @tmb_list based on the TDX module @@ -394,10 +481,15 @@ static int construct_tdmrs(struct list_head *tmb_list, struct tdmr_info_list *tdmr_list, struct tdsysinfo_struct *sysinfo) { + int ret; + + ret = fill_out_tdmrs(tmb_list, tdmr_list); + if (ret) + return ret; + /* * TODO: * - * - Fill out TDMRs to cover all TDX memory regions. * - Allocate and set up PAMTs for each TDMR. * - Designate reserved areas for each TDMR. *
Start to transit out the "multi-steps" to construct a list of "TD Memory Regions" (TDMRs) to cover all TDX-usable memory regions. The kernel configures TDX-usable memory regions by passing a list of TDMRs "TD Memory Regions" (TDMRs) to the TDX module. Each TDMR contains the information of the base/size of a memory region, the base/size of the associated Physical Address Metadata Table (PAMT) and a list of reserved areas in the region. Do the first step to fill out a number of TDMRs to cover all TDX memory regions. To keep it simple, always try to use one TDMR for each memory region. As the first step only set up the base/size for each TDMR. Each TDMR must be 1G aligned and the size must be in 1G granularity. This implies that one TDMR could cover multiple memory regions. If a memory region spans the 1GB boundary and the former part is already covered by the previous TDMR, just use a new TDMR for the remaining part. TDX only supports a limited number of TDMRs. Disable TDX if all TDMRs are consumed but there is more memory region to cover. There are fancier things that could be done like trying to merge adjacent TDMRs. This would allow more pathological memory layouts to be supported. But, current systems are not even close to exhausting the existing TDMR resources in practice. For now, keep it simple. Signed-off-by: Kai Huang <kai.huang@intel.com> --- v10 -> v11: - No update v9 -> v10: - No change. v8 -> v9: - Added the last paragraph in the changelog (Dave). - Removed unnecessary type cast in tdmr_entry() (Dave). --- arch/x86/virt/vmx/tdx/tdx.c | 94 ++++++++++++++++++++++++++++++++++++- 1 file changed, 93 insertions(+), 1 deletion(-)