Message ID | cover.1728903647.git.kai.huang@intel.com (mailing list archive) |
---|---|
Headers | show |
Series | TDX host: metadata reading tweaks, bug fix and info dump | expand |
I'm having one of those "I hate this all" moments. Look at what we say
in the code:
> * See the "global_metadata.json" in the "TDX 1.5 ABI definitions".
Basically step one in verifying that this is all right is: Hey, humans,
please go parse a machine-readable format. That's insanity. If Intel
wants to publish JSON as the canonical source of truth, that's fine.
It's great, actually. But let's stop playing human JSON parser and make
the computers do it for us, OK?
Let's just generate the code. Basically, as long as the generated C is
marginally readable, I'm OK with it. The most important things are:
1. Adding a field is dirt simple
2. Using the generated C is simple
In 99% of the cases, nobody ends up having to ever look at the generated
code.
Take a look at the attached python program and generated C file. I
think they qualify. We can check the script into tools/scripts/ and it
can get re-run when new json comes out or when a new field is needed.
You'd could call the generated code like this:
#include <generated.h>
read_gunk(&tgm);
and use it like this:
foo = tgm.BUILD_NUM;
bar = tgm.BUILD_DATE;
Any field you want to add is a single addition to the python list and
re-running the script. There's not even any need to do:
#define TDX_FOO_BAR_BUILD_DATE 0x8800000200000001
because it's unnecessary when you have:
ret |= read_...(0x8800000200000001, &tgm.BUILD_DATE);
that links the magic number and the "BUILD_DATE" so closely together
anyway. We also don't need type safety *here* at the "read" because
it's machine generated in the first place. If there's a type mismatch
between "0x8800000200000001" and "tgm.BUILD_DATE" we have bigger
problems on our hands.
All the type checking comes when the code consumes tgm.BUILD_DATE (or
whatever).
On Tue, Oct 15, 2024 at 5:30 PM Dave Hansen <dave.hansen@intel.com> wrote: > > I'm having one of those "I hate this all" moments. Look at what we say > in the code: > > > * See the "global_metadata.json" in the "TDX 1.5 ABI definitions". > > Basically step one in verifying that this is all right is: Hey, humans, > please go parse a machine-readable format. That's insanity. If Intel > wants to publish JSON as the canonical source of truth, that's fine. > It's great, actually. But let's stop playing human JSON parser and make > the computers do it for us, OK? > > Let's just generate the code. Basically, as long as the generated C is > marginally readable, I'm OK with it. The most important things are: > > 1. Adding a field is dirt simple > 2. Using the generated C is simple > > In 99% of the cases, nobody ends up having to ever look at the generated > code. > > Take a look at the attached python program and generated C file. I > think they qualify. We can check the script into tools/scripts/ and it > can get re-run when new json comes out or when a new field is needed. > You'd could call the generated code like this: Ok, so let's move this thing forward. Here is a more polished script and the output. Untested beyond compilation. Kai, feel free to include it in v6 with my Signed-off-by: Paolo Bonzini <pbonzini@redhat.om> I made an attempt at adding array support and using it with the CMR information; just to see if Intel is actually trying to make global_metadata.json accurate. The original code has for (i = 0; i < sysinfo_cmr->num_cmrs; i++) { READ_SYS_INFO(CMR_BASE + i, cmr_base[i]); READ_SYS_INFO(CMR_SIZE + i, cmr_size[i]); } The generated code instead always tries to read 32 fields and returns non-zero from get_tdx_sys_info_cmr if they are missing. If it fails to read the fields above NUM_CMRS, just remove that part of the tdx.py script and make sure that a comment in the code shames the TDX ABI documentation adequately. :) Thanks, Paolo
Paolo Bonzini wrote: > On Tue, Oct 15, 2024 at 5:30 PM Dave Hansen <dave.hansen@intel.com> wrote: > > > > I'm having one of those "I hate this all" moments. Look at what we say > > in the code: > > > > > * See the "global_metadata.json" in the "TDX 1.5 ABI definitions". > > > > Basically step one in verifying that this is all right is: Hey, humans, > > please go parse a machine-readable format. That's insanity. If Intel > > wants to publish JSON as the canonical source of truth, that's fine. > > It's great, actually. But let's stop playing human JSON parser and make > > the computers do it for us, OK? > > > > Let's just generate the code. Basically, as long as the generated C is > > marginally readable, I'm OK with it. The most important things are: > > > > 1. Adding a field is dirt simple > > 2. Using the generated C is simple > > > > In 99% of the cases, nobody ends up having to ever look at the generated > > code. > > > > Take a look at the attached python program and generated C file. I > > think they qualify. We can check the script into tools/scripts/ and it > > can get re-run when new json comes out or when a new field is needed. > > You'd could call the generated code like this: > > Ok, so let's move this thing forward. Here is a more polished script > and the output. Untested beyond compilation. > > Kai, feel free to include it in v6 with my > > Signed-off-by: Paolo Bonzini <pbonzini@redhat.om> > > I made an attempt at adding array support and using it with the CMR > information; just to see if Intel is actually trying to make > global_metadata.json accurate. The original code has > > for (i = 0; i < sysinfo_cmr->num_cmrs; i++) { > READ_SYS_INFO(CMR_BASE + i, cmr_base[i]); > READ_SYS_INFO(CMR_SIZE + i, cmr_size[i]); > } > > The generated code instead always tries to read 32 fields and returns > non-zero from get_tdx_sys_info_cmr if they are missing. If it fails to > read the fields above NUM_CMRS, just remove that part of the tdx.py > script and make sure that a comment in the code shames the TDX ABI > documentation adequately. :) Thanks for doing this Paolo, I regret not pushing harder [1] / polishing up the bash+jq script I threw together to do the same. I took a look at your script and the autogenerated code and it looks good to me. Feel free to add my Reviewed-by on a patch that adds that collateral to the tools/ directory. [1]: http://lore.kernel.org/66b19beaadd28_4fc729410@dwillia2-xfh.jf.intel.com.notmuch
On 16/10/2024 8:04 am, Dan Williams wrote: > Paolo Bonzini wrote: >> On Tue, Oct 15, 2024 at 5:30 PM Dave Hansen <dave.hansen@intel.com> wrote: >>> >>> I'm having one of those "I hate this all" moments. Look at what we say >>> in the code: >>> >>>> * See the "global_metadata.json" in the "TDX 1.5 ABI definitions". >>> >>> Basically step one in verifying that this is all right is: Hey, humans, >>> please go parse a machine-readable format. That's insanity. If Intel >>> wants to publish JSON as the canonical source of truth, that's fine. >>> It's great, actually. But let's stop playing human JSON parser and make >>> the computers do it for us, OK? >>> >>> Let's just generate the code. Basically, as long as the generated C is >>> marginally readable, I'm OK with it. The most important things are: >>> >>> 1. Adding a field is dirt simple >>> 2. Using the generated C is simple >>> >>> In 99% of the cases, nobody ends up having to ever look at the generated >>> code. >>> >>> Take a look at the attached python program and generated C file. I >>> think they qualify. We can check the script into tools/scripts/ and it >>> can get re-run when new json comes out or when a new field is needed. >>> You'd could call the generated code like this: >> >> Ok, so let's move this thing forward. Here is a more polished script >> and the output. Untested beyond compilation. >> >> Kai, feel free to include it in v6 with my >> >> Signed-off-by: Paolo Bonzini <pbonzini@redhat.om> >> >> I made an attempt at adding array support and using it with the CMR >> information; just to see if Intel is actually trying to make >> global_metadata.json accurate. The original code has >> >> for (i = 0; i < sysinfo_cmr->num_cmrs; i++) { >> READ_SYS_INFO(CMR_BASE + i, cmr_base[i]); >> READ_SYS_INFO(CMR_SIZE + i, cmr_size[i]); >> } >> >> The generated code instead always tries to read 32 fields and returns >> non-zero from get_tdx_sys_info_cmr if they are missing. If it fails to >> read the fields above NUM_CMRS, just remove that part of the tdx.py >> script and make sure that a comment in the code shames the TDX ABI >> documentation adequately. :) > > Thanks for doing this Paolo, I regret not pushing harder [1] / polishing > up the bash+jq script I threw together to do the same. > > I took a look at your script and the autogenerated code and it looks good > to me. > > Feel free to add my Reviewed-by on a patch that adds that collateral to > the tools/ directory. > > [1]: http://lore.kernel.org/66b19beaadd28_4fc729410@dwillia2-xfh.jf.intel.com.notmuch Hi Dave/Paolo/Dan, I'll go with this for the next version. Thanks for all your feedback and the code you shared. I appreciate.
On Tue, 2024-10-15 at 18:29 +0200, Paolo Bonzini wrote: > On Tue, Oct 15, 2024 at 5:30 PM Dave Hansen <dave.hansen@intel.com> wrote: > > > > I'm having one of those "I hate this all" moments. Look at what we say > > in the code: > > > > > * See the "global_metadata.json" in the "TDX 1.5 ABI definitions". > > > > Basically step one in verifying that this is all right is: Hey, humans, > > please go parse a machine-readable format. That's insanity. If Intel > > wants to publish JSON as the canonical source of truth, that's fine. > > It's great, actually. But let's stop playing human JSON parser and make > > the computers do it for us, OK? > > > > Let's just generate the code. Basically, as long as the generated C is > > marginally readable, I'm OK with it. The most important things are: > > > > 1. Adding a field is dirt simple > > 2. Using the generated C is simple > > > > In 99% of the cases, nobody ends up having to ever look at the generated > > code. > > > > Take a look at the attached python program and generated C file. I > > think they qualify. We can check the script into tools/scripts/ and it > > can get re-run when new json comes out or when a new field is needed. > > You'd could call the generated code like this: > > Ok, so let's move this thing forward. Here is a more polished script > and the output. Untested beyond compilation. > > Kai, feel free to include it in v6 with my > > Signed-off-by: Paolo Bonzini <pbonzini@redhat.om> Hi Dave, Paolo, I updated the script mainly for two purposes: 1) Add auto-generation of main structure 'struct tdx_sys_info' and the main function 'get_tdx_sys_info()' which reads metadata to the main structure. Without it, adding a new field of a new "Class" won't be that simple. Paolo's script only generates 'struct tdx_sys_info_xx' sub-structures and 'get_tdx_sys_info_xx()' sub-functions. We will need to manually add the new sub-structure to 'struct tdx_sys_info' and add code to call the new function to read it. 2) Add support of reading large metadata field which has multiple 64-bit elements. Yes a metadata field can consist multiple 64-bit elements. The example is CPUID_CONFIG_VALUE, which is an array, with each field consisting 2 64-bit elements to represent 128-bit CPUID value EAX/EBX/ECX/EDX. KVM will need to use this CPUID_CONFIG_VALUE to create TD. The way to read such fields (see TDX 1.5 Base spec, 18.6.3. Arrays of Metadata Fields, Figure 18.3: Example of an Array of Four 48 Byte TDCS.RTMR Fields, Each Composed of 6 Elements): Base field ID: X --------------------------------- | elem 0 (X) | elem 1 (X+1) | Field 0 --------------------------------- | elem 0 (X+2) | elem 1 (X+3) | Field 1 --------------------------------- .... To make generation simple, I updated the script to generate such case as two- dimensional array. E.g., Field Name: "XXX" Num Field: M Num Elements: N -> u64 xxx[M][N]; And the code to read: for (i = 0; i < M; i++) for (j = 0; j < N; J++) read...(BASE_ID + i * N + j, &val); There are also some other minor updates. E.g., after asking TDX module guys, they said we should use value reported via "NUM_CMRS" to loop when reading CMRs, so I adjusted the script to use 'sysinfo_cmr->num_cmrs' to loop. I pasted the script below, so that I can have a premalink of this script to use in the next version of this series (for the sake to make the auto-generated code reproducible). I also attached the updated script and auto-generated files. Btw, the checkpatch.pl has below complain about the generated code: ERROR: do not use assignment in if condition #112: FILE: arch/x86/virt/vmx/tdx/tdx_global_metadata.c:15: + if (!ret && !(ret = read_sys_metadata_field(0x8800000200000001, &val))) I didn't address this because this seems fine to me. --- the updated script --- #! /usr/bin/env python3 import json import sys # Note: this script does not run as part of the build process. # It is used to generate structs from the TDX global_metadata.json # file, and functions to fill in said structs. Rerun it if # you need more fields. TDX_STRUCTS = { "version": [ "BUILD_DATE", "BUILD_NUM", "MINOR_VERSION", "MAJOR_VERSION", "UPDATE_VERSION", "INTERNAL_VERSION", ], "features": [ "TDX_FEATURES0" ], "tdmr": [ "MAX_TDMRS", "MAX_RESERVED_PER_TDMR", "PAMT_4K_ENTRY_SIZE", "PAMT_2M_ENTRY_SIZE", "PAMT_1G_ENTRY_SIZE", ], "cmr": [ "NUM_CMRS", "CMR_BASE", "CMR_SIZE" ], # "td_ctrl": [ # "TDR_BASE_SIZE", # "TDCS_BASE_SIZE", # "TDVPS_BASE_SIZE", # ], # "td_conf": [ # "ATTRIBUTES_FIXED0", # "ATTRIBUTES_FIXED1", # "XFAM_FIXED0", # "XFAM_FIXED1", # "NUM_CPUID_CONFIG", # "MAX_VCPUS_PER_TD", # "CPUID_CONFIG_LEAVES", # "CPUID_CONFIG_VALUES", # ], } def print_class_struct_field(field_name, element_bytes, num_fields, num_elements, file): element_type = "u%s" % (element_bytes * 8) element_array = "" if num_fields > 1: element_array += "[%d]" % (num_fields) if num_elements > 1: element_array += "[%d]" % (num_elements) print("\t%s %s%s;" % (element_type, field_name, element_array), file=file) def print_class_struct(class_name, fields, file): struct_name = "tdx_sys_info_%s" % (class_name) print("struct %s {" % (struct_name), file=file) for f in fields: print_class_struct_field( f["Field Name"].lower(), int(f["Element Size (Bytes)"]), int(f["Num Fields"]), int(f["Num Elements"]), file=file) print("};", file=file) def print_read_field(field_id, struct_var, struct_member, indent, file): print( "%sif (!ret && !(ret = read_sys_metadata_field(%s, &val)))\n%s\t%s->%s = val;" % (indent, field_id, indent, struct_var, struct_member), file=file, ) def print_class_function(class_name, fields, file): func_name = "get_tdx_sys_info_%s" % (class_name) struct_name = "tdx_sys_info_%s" % (class_name) struct_var = "sysinfo_%s" % (class_name) print("static int %s(struct %s *%s)" % (func_name, struct_name, struct_var), file=file) print("{", file=file) print("\tint ret = 0;", file=file) print("\tu64 val;", file=file) has_i = 0 has_j = 0 for f in fields: num_fields = int(f["Num Fields"]) num_elements = int(f["Num Elements"]) if num_fields > 1: has_i = 1 if num_elements > 1: has_j = 1 if has_i == 1 and has_j == 1: print("\tint i, j;", file=file) elif has_i == 1: print("\tint i;", file=file) print(file=file) for f in fields: fname = f["Field Name"] field_id = f["Base FIELD_ID (Hex)"] num_fields = int(f["Num Fields"]) num_elements = int(f["Num Elements"]) struct_member = fname.lower() indent = "\t" if num_fields > 1: if fname == "CMR_BASE" or fname == "CMR_SIZE": limit = "sysinfo_cmr->num_cmrs" elif fname == "CPUID_CONFIG_LEAVES" or fname == "CPUID_CONFIG_VALUES": limit = "sysinfo_td_conf->num_cpuid_config" else: limit = "%d" %(num_fields) print("%sfor (i = 0; i < %s; i++)" % (indent, limit), file=file) indent += "\t" field_id += " + i" struct_member += "[i]" if num_elements > 1: print("%sfor (j = 0; j < %d; j++)" % (indent, num_elements), file=file) indent += "\t" field_id += " * 2 + j" struct_member += "[j]" print_read_field( field_id, struct_var, struct_member, indent, file=file, ) print(file=file) print("\treturn ret;", file=file) print("}", file=file) def print_main_struct(file): print("struct tdx_sys_info {", file=file) for class_name, field_names in TDX_STRUCTS.items(): struct_name = "tdx_sys_info_%s" % (class_name) struct_var = class_name print("\tstruct %s %s;" % (struct_name, struct_var), file=file) print("};", file=file) def print_main_function(file): print("static int get_tdx_sys_info(struct tdx_sys_info *sysinfo)", file=file) print("{", file=file) print("\tint ret = 0;", file=file) print(file=file) for class_name, field_names in TDX_STRUCTS.items(): func_name = "get_tdx_sys_info_" + class_name struct_var = class_name print("\tret = ret ?: %s(&sysinfo->%s);" % (func_name, struct_var), file=file) print(file=file) print("\treturn ret;", file=file) print("}", file=file) jsonfile = sys.argv[1] hfile = sys.argv[2] cfile = sys.argv[3] hfileifdef = hfile.replace(".", "_") with open(jsonfile, "r") as f: json_in = json.load(f) fields = {x["Field Name"]: x for x in json_in["Fields"]} with open(hfile, "w") as f: print("/* SPDX-License-Identifier: GPL-2.0 */", file=f) print("/* Automatically generated TDX global metadata structures. */", file=f) print("#ifndef _X86_VIRT_TDX_AUTO_GENERATED_" + hfileifdef.upper(), file=f) print("#define _X86_VIRT_TDX_AUTO_GENERATED_" + hfileifdef.upper(), file=f) print(file=f) print("#include <linux/types.h>", file=f) print(file=f) for class_name, field_names in TDX_STRUCTS.items(): print_class_struct(class_name, [fields[x] for x in field_names], file=f) print(file=f) print_main_struct(file=f) print(file=f) print("#endif", file=f) with open(cfile, "w") as f: print("// SPDX-License-Identifier: GPL-2.0", file=f) print("/*", file=f) print(" * Automatically generated functions to read TDX global metadata.", file=f) print(" *", file=f) print(" * This file doesn't compile on its own as it lacks of inclusion", file=f) print(" * of SEAMCALL wrapper primitive which reads global metadata.", file=f) print(" * Include this file to other C file instead.", file=f) print(" */", file=f) for class_name, field_names in TDX_STRUCTS.items(): print(file=f) print_class_function(class_name, [fields[x] for x in field_names], file=f) print(file=f) print_main_function(file=f)