diff mbox series

[v7,02/20] x86/virt/tdx: Detect TDX during kernel boot

Message ID aaee2d5332a97c840ad401ba935842a998a877ec.1668988357.git.kai.huang@intel.com (mailing list archive)
State New, archived
Headers show
Series TDX host kernel support | expand

Commit Message

Huang, Kai Nov. 21, 2022, 12:26 a.m. UTC
Intel Trust Domain Extensions (TDX) protects guest VMs from malicious
host and certain physical attacks.  A CPU-attested software module
called 'the TDX module' runs inside a new isolated memory range as a
trusted hypervisor to manage and run protected VMs.

Pre-TDX Intel hardware has support for a memory encryption architecture
called MKTME.  The memory encryption hardware underpinning MKTME is also
used for Intel TDX.  TDX ends up "stealing" some of the physical address
space from the MKTME architecture for crypto-protection to VMs.  The
BIOS is responsible for partitioning the "KeyID" space between legacy
MKTME and TDX.  The KeyIDs reserved for TDX are called 'TDX private
KeyIDs' or 'TDX KeyIDs' for short.

TDX doesn't trust the BIOS.  During machine boot, TDX verifies the TDX
private KeyIDs are consistently and correctly programmed by the BIOS
across all CPU packages before it enables TDX on any CPU core.  A valid
TDX private KeyID range on BSP indicates TDX has been enabled by the
BIOS, otherwise the BIOS is buggy.

The TDX module is expected to be loaded by the BIOS when it enables TDX,
but the kernel needs to properly initialize it before it can be used to
create and run any TDX guests.  The TDX module will be initialized at
runtime by the user (i.e. KVM) on demand.

Add a new early_initcall(tdx_init) to do TDX early boot initialization.
Only detect TDX private KeyIDs for now.  Some other early checks will
follow up.  Also add a new function to report whether TDX has been
enabled by BIOS (TDX private KeyID range is valid).  Kexec() will also
need it to determine whether need to flush dirty cachelines that are
associated with any TDX private KeyIDs before booting to the new kernel.

To start to support TDX, create a new arch/x86/virt/vmx/tdx/tdx.c for
TDX host kernel support.  Add a new Kconfig option CONFIG_INTEL_TDX_HOST
to opt-in TDX host kernel support (to distinguish with TDX guest kernel
support).  So far only KVM is the only user of TDX.  Make the new config
option depend on KVM_INTEL.

Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Kai Huang <kai.huang@intel.com>
---

v6 -> v7:
 - No change.

v5 -> v6:
 - Removed SEAMRR detection to make code simpler.
 - Removed the 'default N' in the KVM_TDX_HOST Kconfig (Kirill).
 - Changed to use 'obj-y' in arch/x86/virt/vmx/tdx/Makefile (Kirill).


---
 arch/x86/Kconfig               | 12 +++++
 arch/x86/Makefile              |  2 +
 arch/x86/include/asm/tdx.h     |  7 +++
 arch/x86/virt/Makefile         |  2 +
 arch/x86/virt/vmx/Makefile     |  2 +
 arch/x86/virt/vmx/tdx/Makefile |  2 +
 arch/x86/virt/vmx/tdx/tdx.c    | 95 ++++++++++++++++++++++++++++++++++
 arch/x86/virt/vmx/tdx/tdx.h    | 15 ++++++
 8 files changed, 137 insertions(+)
 create mode 100644 arch/x86/virt/Makefile
 create mode 100644 arch/x86/virt/vmx/Makefile
 create mode 100644 arch/x86/virt/vmx/tdx/Makefile
 create mode 100644 arch/x86/virt/vmx/tdx/tdx.c
 create mode 100644 arch/x86/virt/vmx/tdx/tdx.h

Comments

Kuppuswamy Sathyanarayanan Nov. 21, 2022, 3:07 a.m. UTC | #1
On 11/20/22 4:26 PM, Kai Huang wrote:
> Intel Trust Domain Extensions (TDX) protects guest VMs from malicious
> host and certain physical attacks.  A CPU-attested software module
> called 'the TDX module' runs inside a new isolated memory range as a
> trusted hypervisor to manage and run protected VMs.
> 
> Pre-TDX Intel hardware has support for a memory encryption architecture
> called MKTME.  The memory encryption hardware underpinning MKTME is also
> used for Intel TDX.  TDX ends up "stealing" some of the physical address
> space from the MKTME architecture for crypto-protection to VMs.  The
> BIOS is responsible for partitioning the "KeyID" space between legacy
> MKTME and TDX.  The KeyIDs reserved for TDX are called 'TDX private
> KeyIDs' or 'TDX KeyIDs' for short.
> 
> TDX doesn't trust the BIOS.  During machine boot, TDX verifies the TDX
> private KeyIDs are consistently and correctly programmed by the BIOS
> across all CPU packages before it enables TDX on any CPU core.  A valid
> TDX private KeyID range on BSP indicates TDX has been enabled by the
> BIOS, otherwise the BIOS is buggy.
> 
> The TDX module is expected to be loaded by the BIOS when it enables TDX,
> but the kernel needs to properly initialize it before it can be used to
> create and run any TDX guests.  The TDX module will be initialized at
> runtime by the user (i.e. KVM) on demand.
> 
> Add a new early_initcall(tdx_init) to do TDX early boot initialization.
> Only detect TDX private KeyIDs for now.  Some other early checks will
> follow up.  Also add a new function to report whether TDX has been
> enabled by BIOS (TDX private KeyID range is valid).  Kexec() will also
> need it to determine whether need to flush dirty cachelines that are
> associated with any TDX private KeyIDs before booting to the new kernel.
> 
> To start to support TDX, create a new arch/x86/virt/vmx/tdx/tdx.c for
> TDX host kernel support.  Add a new Kconfig option CONFIG_INTEL_TDX_HOST
> to opt-in TDX host kernel support (to distinguish with TDX guest kernel
> support).  So far only KVM is the only user of TDX.  Make the new config
> option depend on KVM_INTEL.
> 
> Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Signed-off-by: Kai Huang <kai.huang@intel.com>
> ---
> 
> v6 -> v7:
>  - No change.
> 
> v5 -> v6:
>  - Removed SEAMRR detection to make code simpler.
>  - Removed the 'default N' in the KVM_TDX_HOST Kconfig (Kirill).
>  - Changed to use 'obj-y' in arch/x86/virt/vmx/tdx/Makefile (Kirill).
> 
> 
> ---
>  arch/x86/Kconfig               | 12 +++++
>  arch/x86/Makefile              |  2 +
>  arch/x86/include/asm/tdx.h     |  7 +++
>  arch/x86/virt/Makefile         |  2 +
>  arch/x86/virt/vmx/Makefile     |  2 +
>  arch/x86/virt/vmx/tdx/Makefile |  2 +
>  arch/x86/virt/vmx/tdx/tdx.c    | 95 ++++++++++++++++++++++++++++++++++
>  arch/x86/virt/vmx/tdx/tdx.h    | 15 ++++++
>  8 files changed, 137 insertions(+)
>  create mode 100644 arch/x86/virt/Makefile
>  create mode 100644 arch/x86/virt/vmx/Makefile
>  create mode 100644 arch/x86/virt/vmx/tdx/Makefile
>  create mode 100644 arch/x86/virt/vmx/tdx/tdx.c
>  create mode 100644 arch/x86/virt/vmx/tdx/tdx.h
> 
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 67745ceab0db..cced4ef3bfb2 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -1953,6 +1953,18 @@ config X86_SGX
>  
>  	  If unsure, say N.
>  
> +config INTEL_TDX_HOST
> +	bool "Intel Trust Domain Extensions (TDX) host support"
> +	depends on CPU_SUP_INTEL
> +	depends on X86_64
> +	depends on KVM_INTEL
> +	help
> +	  Intel Trust Domain Extensions (TDX) protects guest VMs from malicious
> +	  host and certain physical attacks.  This option enables necessary TDX
> +	  support in host kernel to run protected VMs.
> +
> +	  If unsure, say N.
> +
>  config EFI
>  	bool "EFI runtime service support"
>  	depends on ACPI
> diff --git a/arch/x86/Makefile b/arch/x86/Makefile
> index 415a5d138de4..38d3e8addc5f 100644
> --- a/arch/x86/Makefile
> +++ b/arch/x86/Makefile
> @@ -246,6 +246,8 @@ archheaders:
>  
>  libs-y  += arch/x86/lib/
>  
> +core-y += arch/x86/virt/
> +
>  # drivers-y are linked after core-y
>  drivers-$(CONFIG_MATH_EMULATION) += arch/x86/math-emu/
>  drivers-$(CONFIG_PCI)            += arch/x86/pci/
> diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h
> index e9a3f4a6fba1..51c4222a13ae 100644
> --- a/arch/x86/include/asm/tdx.h
> +++ b/arch/x86/include/asm/tdx.h
> @@ -98,5 +98,12 @@ static inline long tdx_kvm_hypercall(unsigned int nr, unsigned long p1,
>  	return -ENODEV;
>  }
>  #endif /* CONFIG_INTEL_TDX_GUEST && CONFIG_KVM_GUEST */
> +
> +#ifdef CONFIG_INTEL_TDX_HOST
> +bool platform_tdx_enabled(void);
> +#else	/* !CONFIG_INTEL_TDX_HOST */
> +static inline bool platform_tdx_enabled(void) { return false; }
> +#endif	/* CONFIG_INTEL_TDX_HOST */
> +
>  #endif /* !__ASSEMBLY__ */
>  #endif /* _ASM_X86_TDX_H */
> diff --git a/arch/x86/virt/Makefile b/arch/x86/virt/Makefile
> new file mode 100644
> index 000000000000..1e36502cd738
> --- /dev/null
> +++ b/arch/x86/virt/Makefile
> @@ -0,0 +1,2 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +obj-y	+= vmx/
> diff --git a/arch/x86/virt/vmx/Makefile b/arch/x86/virt/vmx/Makefile
> new file mode 100644
> index 000000000000..feebda21d793
> --- /dev/null
> +++ b/arch/x86/virt/vmx/Makefile
> @@ -0,0 +1,2 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +obj-$(CONFIG_INTEL_TDX_HOST)	+= tdx/
> diff --git a/arch/x86/virt/vmx/tdx/Makefile b/arch/x86/virt/vmx/tdx/Makefile
> new file mode 100644
> index 000000000000..93ca8b73e1f1
> --- /dev/null
> +++ b/arch/x86/virt/vmx/tdx/Makefile
> @@ -0,0 +1,2 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +obj-y += tdx.o
> diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
> new file mode 100644
> index 000000000000..982d9c453b6b
> --- /dev/null
> +++ b/arch/x86/virt/vmx/tdx/tdx.c
> @@ -0,0 +1,95 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright(c) 2022 Intel Corporation.
> + *
> + * Intel Trusted Domain Extensions (TDX) support
> + */
> +
> +#define pr_fmt(fmt)	"tdx: " fmt
> +
> +#include <linux/types.h>
> +#include <linux/init.h>
> +#include <linux/printk.h>
> +#include <asm/msr-index.h>
> +#include <asm/msr.h>
> +#include <asm/tdx.h>
> +#include "tdx.h"
> +
> +static u32 tdx_keyid_start __ro_after_init;
> +static u32 tdx_keyid_num __ro_after_init;
> +
> +/*
> + * Detect TDX private KeyIDs to see whether TDX has been enabled by the
> + * BIOS.  Both initializing the TDX module and running TDX guest require
> + * TDX private KeyID.
> + *
> + * TDX doesn't trust BIOS.  TDX verifies all configurations from BIOS
> + * are correct before enabling TDX on any core.  TDX requires the BIOS
> + * to correctly and consistently program TDX private KeyIDs on all CPU
> + * packages.  Unless there is a BIOS bug, detecting a valid TDX private
> + * KeyID range on BSP indicates TDX has been enabled by the BIOS.  If
> + * there's such BIOS bug, it will be caught later when initializing the
> + * TDX module.
> + */
> +static int __init detect_tdx(void)
> +{
> +	int ret;
> +
> +	/*
> +	 * IA32_MKTME_KEYID_PARTIONING:
> +	 *   Bit [31:0]:	Number of MKTME KeyIDs.
> +	 *   Bit [63:32]:	Number of TDX private KeyIDs.
> +	 */
> +	ret = rdmsr_safe(MSR_IA32_MKTME_KEYID_PARTITIONING, &tdx_keyid_start,
> +			&tdx_keyid_num);
> +	if (ret)
> +		return -ENODEV;
> +
> +	if (!tdx_keyid_num)
> +		return -ENODEV;
> +
> +	/*
> +	 * KeyID 0 is for TME.  MKTME KeyIDs start from 1.  TDX private
> +	 * KeyIDs start after the last MKTME KeyID.
> +	 */
> +	tdx_keyid_start++;

It is not very clear why you increment tdx_keyid_start. What we read from
MSR_IA32_MKTME_KEYID_PARTITIONING is not the correct start keyid?

Also why is this global variable? At least in this patch, there seems to
be no use case.

> +
> +	pr_info("TDX enabled by BIOS. TDX private KeyID range: [%u, %u)\n",
> +			tdx_keyid_start, tdx_keyid_start + tdx_keyid_num);
> +
> +	return 0;
> +}
> +
> +static void __init clear_tdx(void)
> +{
> +	tdx_keyid_start = tdx_keyid_num = 0;
> +}
> +
> +static int __init tdx_init(void)
> +{
> +	if (detect_tdx())
> +		return -ENODEV;
> +
> +	/*
> +	 * Initializing the TDX module requires one TDX private KeyID.
> +	 * If there's only one TDX KeyID then after module initialization
> +	 * KVM won't be able to run any TDX guest, which makes the whole
> +	 * thing worthless.  Just disable TDX in this case.
> +	 */
> +	if (tdx_keyid_num < 2) {
> +		pr_info("Disable TDX as there's only one TDX private KeyID available.\n");
> +		goto no_tdx;
> +	}
> +
> +	return 0;
> +no_tdx:
> +	clear_tdx();
> +	return -ENODEV;
> +}
> +early_initcall(tdx_init);
> +
> +/* Return whether the BIOS has enabled TDX */
> +bool platform_tdx_enabled(void)
> +{
> +	return !!tdx_keyid_num;
> +}
> diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h
> new file mode 100644
> index 000000000000..d00074abcb20
> --- /dev/null
> +++ b/arch/x86/virt/vmx/tdx/tdx.h
> @@ -0,0 +1,15 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef _X86_VIRT_TDX_H
> +#define _X86_VIRT_TDX_H
> +
> +/*
> + * This file contains both macros and data structures defined by the TDX
> + * architecture and Linux defined software data structures and functions.
> + * The two should not be mixed together for better readability.  The
> + * architectural definitions come first.
> + */
> +
> +/* MSR to report KeyID partitioning between MKTME and TDX */
> +#define MSR_IA32_MKTME_KEYID_PARTITIONING	0x00000087
> +
> +#endif
Huang, Kai Nov. 21, 2022, 9:37 a.m. UTC | #2
> > +static u32 tdx_keyid_start __ro_after_init;
> > +static u32 tdx_keyid_num __ro_after_init;
> > +
> > +/*
> > + * Detect TDX private KeyIDs to see whether TDX has been enabled by the
> > + * BIOS.  Both initializing the TDX module and running TDX guest require
> > + * TDX private KeyID.
> > + *
> > + * TDX doesn't trust BIOS.  TDX verifies all configurations from BIOS
> > + * are correct before enabling TDX on any core.  TDX requires the BIOS
> > + * to correctly and consistently program TDX private KeyIDs on all CPU
> > + * packages.  Unless there is a BIOS bug, detecting a valid TDX private
> > + * KeyID range on BSP indicates TDX has been enabled by the BIOS.  If
> > + * there's such BIOS bug, it will be caught later when initializing the
> > + * TDX module.
> > + */
> > +static int __init detect_tdx(void)
> > +{
> > +	int ret;
> > +
> > +	/*
> > +	 * IA32_MKTME_KEYID_PARTIONING:
> > +	 *   Bit [31:0]:	Number of MKTME KeyIDs.
> > +	 *   Bit [63:32]:	Number of TDX private KeyIDs.
> > +	 */
> > +	ret = rdmsr_safe(MSR_IA32_MKTME_KEYID_PARTITIONING, &tdx_keyid_start,
> > +			&tdx_keyid_num);
> > +	if (ret)
> > +		return -ENODEV;
> > +
> > +	if (!tdx_keyid_num)
> > +		return -ENODEV;
> > +
> > +	/*
> > +	 * KeyID 0 is for TME.  MKTME KeyIDs start from 1.  TDX private
> > +	 * KeyIDs start after the last MKTME KeyID.
> > +	 */
> > +	tdx_keyid_start++;
> 
> It is not very clear why you increment tdx_keyid_start. 
> 

Please see above comment around rdmsr_safe():

	/*
	 * IA32_MKTME_KEYID_PARTIONING:
	 *   Bit [31:0]:	Number of MKTME KeyIDs.
	 *   Bit [63:32]:	Number of TDX private KeyIDs.
	 */

And the comment right above 'tdx_keyid_start++':

	/*
	 * KeyID 0 is for TME.  MKTME KeyIDs start from 1.  TDX private
	 * KeyIDs start after the last MKTME KeyID.
	 */

Do the above two comments answer the question?

Or I can be more explicit, such as below?

"
Now tdx_start_keyid is the last MKTME KeyID.  TDX private KeyIDs start after the
last MKTME KeyID.  Increase tdx_start_keyid by 1 to set it to the first TDX
private KeyID.
"


> What we read from
> MSR_IA32_MKTME_KEYID_PARTITIONING is not the correct start keyid?


TDX verifies TDX private KeyID range is configured consistently across all
packages.  Any wrong KeyID range means BIOS bug, and such bug will cause TDX
being not enabled -- later TDX module initialization will catch this.

> 
> Also why is this global variable? At least in this patch, there seems to
> be no use case.

Platform_tdx_enabled() uses tdx_keyid_num to determine whether TDX is enabled by
BIOS.

Also, in the changlog I can add "both initializing the TDX module and creating
TDX guest will need to use TDX private KeyID".

But I also have a comment saying something similar around ...

> 
> > +	/*
> > +	 * Initializing the TDX module requires one TDX private KeyID.
> > +	 * If there's only one TDX KeyID then after module initialization
> > +	 * KVM won't be able to run any TDX guest, which makes the whole
> > +	 * thing worthless.  Just disable TDX in this case.
> > +	 */
> > +	if (tdx_keyid_num < 2) {
> > +		pr_info("Disable TDX as there's only one TDX private KeyID available.\n");
> > +		goto no_tdx;
> > +	}
> > +

... here.
Kuppuswamy Sathyanarayanan Nov. 21, 2022, 11:57 p.m. UTC | #3
On 11/21/22 1:37 AM, Huang, Kai wrote:
>> Also why is this global variable? At least in this patch, there seems to
>> be no use case.
> Platform_tdx_enabled() uses tdx_keyid_num to determine whether TDX is enabled by
> BIOS.
> 
> Also, in the changlog I can add "both initializing the TDX module and creating
> TDX guest will need to use TDX private KeyID".
> 
> But I also have a comment saying something similar around ...
> 

I am asking about the tdx_keyid_start. It mainly used in detect_tdx(). Maybe you
declared it as global as a preparation for next patches. But it is not explained
in change log.
Dave Hansen Nov. 22, 2022, 12:10 a.m. UTC | #4
On 11/20/22 16:26, Kai Huang wrote:
> Intel Trust Domain Extensions (TDX) protects guest VMs from malicious
> host and certain physical attacks.  A CPU-attested software module
> called 'the TDX module' runs inside a new isolated memory range as a
> trusted hypervisor to manage and run protected VMs.
> 
> Pre-TDX Intel hardware has support for a memory encryption architecture
> called MKTME.  The memory encryption hardware underpinning MKTME is also
> used for Intel TDX.  TDX ends up "stealing" some of the physical address
> space from the MKTME architecture for crypto-protection to VMs.  The
> BIOS is responsible for partitioning the "KeyID" space between legacy
> MKTME and TDX.  The KeyIDs reserved for TDX are called 'TDX private
> KeyIDs' or 'TDX KeyIDs' for short.
> 
> TDX doesn't trust the BIOS.  During machine boot, TDX verifies the TDX
> private KeyIDs are consistently and correctly programmed by the BIOS
> across all CPU packages before it enables TDX on any CPU core.  A valid
> TDX private KeyID range on BSP indicates TDX has been enabled by the
> BIOS, otherwise the BIOS is buggy.
> 
> The TDX module is expected to be loaded by the BIOS when it enables TDX,
> but the kernel needs to properly initialize it before it can be used to
> create and run any TDX guests.  The TDX module will be initialized at
> runtime by the user (i.e. KVM) on demand.

Calling KVM "the user" is a stretch.  How about we give actual user
facts instead of filling this with i.e.'s when there's only one actual
way it happens?

	The TDX module will be initialized by the KVM subsystem when
	<insert actual trigger description here>.

> Add a new early_initcall(tdx_init) to do TDX early boot initialization.
> Only detect TDX private KeyIDs for now.  Some other early checks will
> follow up.

Just say what this patch is doing.  Don't try to

>  Also add a new function to report whether TDX has been
> enabled by BIOS (TDX private KeyID range is valid).  Kexec() will also
> need it to determine whether need to flush dirty cachelines that are
> associated with any TDX private KeyIDs before booting to the new kernel.

That last sentence doesn't parse correctly.

> To start to support TDX, create a new arch/x86/virt/vmx/tdx/tdx.c for
> TDX host kernel support.  Add a new Kconfig option CONFIG_INTEL_TDX_HOST
> to opt-in TDX host kernel support (to distinguish with TDX guest kernel
> support).  So far only KVM is the only user of TDX.  Make the new config
> option depend on KVM_INTEL.
..
> +config INTEL_TDX_HOST
> +	bool "Intel Trust Domain Extensions (TDX) host support"
> +	depends on CPU_SUP_INTEL
> +	depends on X86_64
> +	depends on KVM_INTEL
> +	help
> +	  Intel Trust Domain Extensions (TDX) protects guest VMs from malicious
> +	  host and certain physical attacks.  This option enables necessary TDX
> +	  support in host kernel to run protected VMs.
> +
> +	  If unsure, say N.
> +
>  config EFI
>  	bool "EFI runtime service support"
>  	depends on ACPI
> diff --git a/arch/x86/Makefile b/arch/x86/Makefile
> index 415a5d138de4..38d3e8addc5f 100644
> --- a/arch/x86/Makefile
> +++ b/arch/x86/Makefile
> @@ -246,6 +246,8 @@ archheaders:
>  
>  libs-y  += arch/x86/lib/
>  
> +core-y += arch/x86/virt/
> +
>  # drivers-y are linked after core-y
>  drivers-$(CONFIG_MATH_EMULATION) += arch/x86/math-emu/
>  drivers-$(CONFIG_PCI)            += arch/x86/pci/
> diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h
> index e9a3f4a6fba1..51c4222a13ae 100644
> --- a/arch/x86/include/asm/tdx.h
> +++ b/arch/x86/include/asm/tdx.h
> @@ -98,5 +98,12 @@ static inline long tdx_kvm_hypercall(unsigned int nr, unsigned long p1,
>  	return -ENODEV;
>  }
>  #endif /* CONFIG_INTEL_TDX_GUEST && CONFIG_KVM_GUEST */
> +
> +#ifdef CONFIG_INTEL_TDX_HOST
> +bool platform_tdx_enabled(void);
> +#else	/* !CONFIG_INTEL_TDX_HOST */
> +static inline bool platform_tdx_enabled(void) { return false; }
> +#endif	/* CONFIG_INTEL_TDX_HOST */
> +
>  #endif /* !__ASSEMBLY__ */
>  #endif /* _ASM_X86_TDX_H */
> diff --git a/arch/x86/virt/Makefile b/arch/x86/virt/Makefile
> new file mode 100644
> index 000000000000..1e36502cd738
> --- /dev/null
> +++ b/arch/x86/virt/Makefile
> @@ -0,0 +1,2 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +obj-y	+= vmx/
> diff --git a/arch/x86/virt/vmx/Makefile b/arch/x86/virt/vmx/Makefile
> new file mode 100644
> index 000000000000..feebda21d793
> --- /dev/null
> +++ b/arch/x86/virt/vmx/Makefile
> @@ -0,0 +1,2 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +obj-$(CONFIG_INTEL_TDX_HOST)	+= tdx/
> diff --git a/arch/x86/virt/vmx/tdx/Makefile b/arch/x86/virt/vmx/tdx/Makefile
> new file mode 100644
> index 000000000000..93ca8b73e1f1
> --- /dev/null
> +++ b/arch/x86/virt/vmx/tdx/Makefile
> @@ -0,0 +1,2 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +obj-y += tdx.o
> diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
> new file mode 100644
> index 000000000000..982d9c453b6b
> --- /dev/null
> +++ b/arch/x86/virt/vmx/tdx/tdx.c
> @@ -0,0 +1,95 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright(c) 2022 Intel Corporation.
> + *
> + * Intel Trusted Domain Extensions (TDX) support
> + */
> +
> +#define pr_fmt(fmt)	"tdx: " fmt
> +
> +#include <linux/types.h>
> +#include <linux/init.h>
> +#include <linux/printk.h>
> +#include <asm/msr-index.h>
> +#include <asm/msr.h>
> +#include <asm/tdx.h>
> +#include "tdx.h"
> +
> +static u32 tdx_keyid_start __ro_after_init;
> +static u32 tdx_keyid_num __ro_after_init;
> +
> +/*
> + * Detect TDX private KeyIDs to see whether TDX has been enabled by the
> + * BIOS.  Both initializing the TDX module and running TDX guest require
> + * TDX private KeyID.

This comment is not right, sorry.

Talk about the function at a *HIGH* level.  Don't talk about every
little detailed facet of the function.  That's what the code is there for.

> + * TDX doesn't trust BIOS.  TDX verifies all configurations from BIOS
> + * are correct before enabling TDX on any core.  TDX requires the BIOS
> + * to correctly and consistently program TDX private KeyIDs on all CPU
> + * packages.  Unless there is a BIOS bug, detecting a valid TDX private
> + * KeyID range on BSP indicates TDX has been enabled by the BIOS.  If
> + * there's such BIOS bug, it will be caught later when initializing the
> + * TDX module.
> + */

I have no idea what that comment is doing.  Can it just be removed?

> +static int __init detect_tdx(void)
> +{
> +	int ret;
> +
> +	/*
> +	 * IA32_MKTME_KEYID_PARTIONING:
> +	 *   Bit [31:0]:	Number of MKTME KeyIDs.
> +	 *   Bit [63:32]:	Number of TDX private KeyIDs.
> +	 */
> +	ret = rdmsr_safe(MSR_IA32_MKTME_KEYID_PARTITIONING, &tdx_keyid_start,
> +			&tdx_keyid_num);

'tdx_keyid_start' appears to be named wrong.

> +	if (ret)
> +		return -ENODEV;
> +
> +	if (!tdx_keyid_num)
> +		return -ENODEV;
> +
> +	/*
> +	 * KeyID 0 is for TME.  MKTME KeyIDs start from 1.  TDX private
> +	 * KeyIDs start after the last MKTME KeyID.
> +	 */

Is the TME key a "MKTME KeyID"?

> +	tdx_keyid_start++;

... and this confirms it.

This probably should be:

	u32 nr_mktme_keyids;

	ret = rdmsr_safe(MSR_IA32_MKTME_KEYID_PARTITIONING,
					&nr_mktme_keyids,
					&tdx_keyid_num);
	...

	/* TDX KeyIDs start after the last MKTME KeyID */
	tdx_keyid_start = nr_mktme_keyids + 1;

See how that makes actual logical sense and barely even needs the comment?


> +	pr_info("TDX enabled by BIOS. TDX private KeyID range: [%u, %u)\n",
> +			tdx_keyid_start, tdx_keyid_start + tdx_keyid_num);
> +
> +	return 0;
> +}
> +
> +static void __init clear_tdx(void)
> +{
> +	tdx_keyid_start = tdx_keyid_num = 0;
> +}

This is where a comment is needed and can actually help.

/*
 * tdx_keyid_start/num indicate that TDX is uninitialized.  This
 * is used in TDX initialization error paths to take it from
 * initialized -> uninitialized.
 */

> +static int __init tdx_init(void)
> +{
> +	if (detect_tdx())
> +		return -ENODEV;

This reads as:

	if tdx is detected:
		return error


So, first, why bother having detect_tdx() return fancy -ERRNO codes if
they're going to be throw away?  You could at *least* do:


	int err;

	err = tdx_record_keyid_partioning();
	if (err)
		return err;

Note how tdx_record_keyid_partioning() actually talks about what the
function does.  There's also a recent trend in x86 land not to put
obvious prefixes on functions.  That would make the naming more or less
record_keyid_partioning().

I kinda like the consistent prefixes but Boris doesn't.

> +	/*
> +	 * Initializing the TDX module requires one TDX private KeyID.
> +	 * If there's only one TDX KeyID then after module initialization
> +	 * KVM won't be able to run any TDX guest, which makes the whole
> +	 * thing worthless.  Just disable TDX in this case.
> +	 */
> +	if (tdx_keyid_num < 2) {
> +		pr_info("Disable TDX as there's only one TDX private KeyID available.\n");
> +		goto no_tdx;
> +	}

'tdx_keyid_num' is a crummy name.  Here it reads like, "if the tdx keyid
number is < 2'.  Which is wrong.  A better name would be: nr_tdx_keyids

That's also a horrible error message.  You have:

+#define pr_fmt(fmt)	"tdx: " fmt

so that message will look like:

tdx: Disable TDX as there's only one TDX private KeyID available.

How many 'TDX' strings do we need in one message.  How about:

	pr_info("initialization failed: too few private KeyIDs available
(%d).\n", nr_tdx_keyids;

That gives a lot more information and removes the two redundant TDX strings.

 +no_tdx:
> +	clear_tdx();
> +	return -ENODEV;
> +}
> +early_initcall(tdx_init);
> +
> +/* Return whether the BIOS has enabled TDX */
> +bool platform_tdx_enabled(void)
> +{
> +	return !!tdx_keyid_num;
> +}
> diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h
> new file mode 100644
> index 000000000000..d00074abcb20
> --- /dev/null
> +++ b/arch/x86/virt/vmx/tdx/tdx.h
> @@ -0,0 +1,15 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef _X86_VIRT_TDX_H
> +#define _X86_VIRT_TDX_H
> +
> +/*
> + * This file contains both macros and data structures defined by the TDX
> + * architecture and Linux defined software data structures and functions.
> + * The two should not be mixed together for better readability.  The
> + * architectural definitions come first.
> + */
> +
> +/* MSR to report KeyID partitioning between MKTME and TDX */
> +#define MSR_IA32_MKTME_KEYID_PARTITIONING	0x00000087
> +
> +#endif
Huang, Kai Nov. 22, 2022, 11:28 a.m. UTC | #5
> > 
> > The TDX module is expected to be loaded by the BIOS when it enables TDX,
> > but the kernel needs to properly initialize it before it can be used to
> > create and run any TDX guests.  The TDX module will be initialized at
> > runtime by the user (i.e. KVM) on demand.
> 
> Calling KVM "the user" is a stretch.  How about we give actual user
> facts instead of filling this with i.e.'s when there's only one actual
> way it happens?
> 
> 	The TDX module will be initialized by the KVM subsystem when
> 	<insert actual trigger description here>.
> 
> > Add a new early_initcall(tdx_init) to do TDX early boot initialization.
> > Only detect TDX private KeyIDs for now.  Some other early checks will
> > follow up.
> 
> Just say what this patch is doing.  Don't try to
> 
> >  Also add a new function to report whether TDX has been
> > enabled by BIOS (TDX private KeyID range is valid).  Kexec() will also
> > need it to determine whether need to flush dirty cachelines that are
> > associated with any TDX private KeyIDs before booting to the new kernel.
> 
> That last sentence doesn't parse correctly.

Will do all above.  Please see updated patch at the bottom.


[...]

> > +/*
> > + * Detect TDX private KeyIDs to see whether TDX has been enabled by the
> > + * BIOS.  Both initializing the TDX module and running TDX guest require
> > + * TDX private KeyID.
> 
> This comment is not right, sorry.
> 
> Talk about the function at a *HIGH* level.  Don't talk about every
> little detailed facet of the function.  That's what the code is there for.
> 
> > + * TDX doesn't trust BIOS.  TDX verifies all configurations from BIOS
> > + * are correct before enabling TDX on any core.  TDX requires the BIOS
> > + * to correctly and consistently program TDX private KeyIDs on all CPU
> > + * packages.  Unless there is a BIOS bug, detecting a valid TDX private
> > + * KeyID range on BSP indicates TDX has been enabled by the BIOS.  If
> > + * there's such BIOS bug, it will be caught later when initializing the
> > + * TDX module.
> > + */
> 
> I have no idea what that comment is doing.  Can it just be removed?

Will remove this part and update the entire comment.  

Also will address all your other comments.  Please see the updated patch.

[...]

> 
> > +	/*
> > +	 * KeyID 0 is for TME.  MKTME KeyIDs start from 1.  TDX private
> > +	 * KeyIDs start after the last MKTME KeyID.
> > +	 */
> 
> Is the TME key a "MKTME KeyID"?

I don't think so.  Hardware handles TME KeyID 0 differently from non-0 MKTME
KeyIDs.  And PCONFIG only accept non-0 KeyIDs.

> 
> > +static void __init clear_tdx(void)
> > +{
> > +	tdx_keyid_start = tdx_keyid_num = 0;
> > +}
> 
> This is where a comment is needed and can actually help.
> 
> /*
>  * tdx_keyid_start/num indicate that TDX is uninitialized.  This
>  * is used in TDX initialization error paths to take it from
>  * initialized -> uninitialized.
>  */
> 

Just want to point out after removing the !x2apic_enabled() check, the only
thing need to do here is to detect/record the TDX KeyIDs.

And the purpose of this TDX boot-time initialization code is to provide
platform_tdx_enabled() function so that kexec() can use.

To distinguish boot-time TDX initialization from runtime TDX module
initialization, how about change the comment to below?

static void __init clear_tdx(void)
{
        /*
         * tdx_keyid_start and nr_tdx_keyids indicate that TDX is not
         * enabled by the BIOS.  This is used in TDX boot-time
         * initializatiton error paths to take it from enabled to not
         * enabled.
         */
        tdx_keyid_start = nr_tdx_keyids = 0;
}

[...]

And below is the updated patch.  How does it look to you?

==========================================================

    x86/virt/tdx: Detect TDX during kernel boot
    
    Intel Trust Domain Extensions (TDX) protects guest VMs from malicious
    host and certain physical attacks.  A CPU-attested software module
    called 'the TDX module' runs inside a new isolated memory range as a
    trusted hypervisor to manage and run protected VMs.
    
    Pre-TDX Intel hardware has support for a memory encryption architecture
    called MKTME.  The memory encryption hardware underpinning MKTME is also
    used for Intel TDX.  TDX ends up "stealing" some of the physical address
    space from the MKTME architecture for crypto-protection to VMs.  The
    BIOS is responsible for partitioning the "KeyID" space between legacy
    MKTME and TDX.  The KeyIDs reserved for TDX are called 'TDX private
    KeyIDs' or 'TDX KeyIDs' for short.
    
    TDX doesn't trust the BIOS.  During machine boot, TDX verifies the TDX
    private KeyIDs are consistently and correctly programmed by the BIOS
    across all CPU packages before it enables TDX on any CPU core.  A valid
    TDX private KeyID range on BSP indicates TDX has been enabled by the
    BIOS, otherwise the BIOS is buggy.
    
    The TDX module is expected to be loaded by the BIOS when it enables TDX,
    but the kernel needs to properly initialize it before it can be used to
    create and run any TDX guests.  The TDX module will be initialized by
    the KVM subsystem when the KVM module is loaded.
    
    Add a new early_initcall(tdx_init) to detect the TDX private KeyIDs.
    Both TDX module initialization and creating TDX guest require to use TDX
    private KeyID.  Also add a function to report whether TDX is enabled by
    the BIOS (TDX KeyID range is valid).  Similar to AMD SME, kexec() will
    use it to determine whether cache flush is needed.
    
    To start to support TDX, create a new arch/x86/virt/vmx/tdx/tdx.c for
    TDX host kernel support.  Add a new Kconfig option CONFIG_INTEL_TDX_HOST
    to opt-in TDX host kernel support (to distinguish with TDX guest kernel
    support).  So far only KVM uses TDX.  Make the new config option depend
    on KVM_INTEL.
    
    Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
    Signed-off-by: Kai Huang <kai.huang@intel.com>

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 67745ceab0db..cced4ef3bfb2 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1953,6 +1953,18 @@ config X86_SGX
 
          If unsure, say N.
 
+config INTEL_TDX_HOST
+       bool "Intel Trust Domain Extensions (TDX) host support"
+       depends on CPU_SUP_INTEL
+       depends on X86_64
+       depends on KVM_INTEL
+       help
+         Intel Trust Domain Extensions (TDX) protects guest VMs from malicious
+         host and certain physical attacks.  This option enables necessary TDX
+         support in host kernel to run protected VMs.
+
+         If unsure, say N.
+
 config EFI
        bool "EFI runtime service support"
        depends on ACPI
diff --git a/arch/x86/Makefile b/arch/x86/Makefile
index 415a5d138de4..38d3e8addc5f 100644
--- a/arch/x86/Makefile
+++ b/arch/x86/Makefile
@@ -246,6 +246,8 @@ archheaders:
 
 libs-y  += arch/x86/lib/
 
+core-y += arch/x86/virt/
+
 # drivers-y are linked after core-y
 drivers-$(CONFIG_MATH_EMULATION) += arch/x86/math-emu/
 drivers-$(CONFIG_PCI)            += arch/x86/pci/
diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h
index 25fd6070dc0b..4dfe2e794411 100644
--- a/arch/x86/include/asm/tdx.h
+++ b/arch/x86/include/asm/tdx.h
@@ -94,5 +94,12 @@ static inline long tdx_kvm_hypercall(unsigned int nr,
unsigned long p1,
        return -ENODEV;
 }
 #endif /* CONFIG_INTEL_TDX_GUEST && CONFIG_KVM_GUEST */
+
+#ifdef CONFIG_INTEL_TDX_HOST
+bool platform_tdx_enabled(void);
+#else  /* !CONFIG_INTEL_TDX_HOST */
+static inline bool platform_tdx_enabled(void) { return false; }
+#endif /* CONFIG_INTEL_TDX_HOST */
+
 #endif /* !__ASSEMBLY__ */
 #endif /* _ASM_X86_TDX_H */
diff --git a/arch/x86/virt/Makefile b/arch/x86/virt/Makefile
new file mode 100644
index 000000000000..1e36502cd738
--- /dev/null
+++ b/arch/x86/virt/Makefile
@@ -0,0 +1,2 @@
+# SPDX-License-Identifier: GPL-2.0-only
+obj-y  += vmx/
diff --git a/arch/x86/virt/vmx/Makefile b/arch/x86/virt/vmx/Makefile
new file mode 100644
index 000000000000..feebda21d793
--- /dev/null
+++ b/arch/x86/virt/vmx/Makefile
@@ -0,0 +1,2 @@
+# SPDX-License-Identifier: GPL-2.0-only
+obj-$(CONFIG_INTEL_TDX_HOST)   += tdx/
diff --git a/arch/x86/virt/vmx/tdx/Makefile b/arch/x86/virt/vmx/tdx/Makefile
new file mode 100644
index 000000000000..93ca8b73e1f1
--- /dev/null
+++ b/arch/x86/virt/vmx/tdx/Makefile
@@ -0,0 +1,2 @@
+# SPDX-License-Identifier: GPL-2.0-only
+obj-y += tdx.o
diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
new file mode 100644
index 000000000000..a60611448111
--- /dev/null
+++ b/arch/x86/virt/vmx/tdx/tdx.c
@@ -0,0 +1,90 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright(c) 2022 Intel Corporation.
+ *
+ * Intel Trusted Domain Extensions (TDX) support
+ */
+
+#define pr_fmt(fmt)    "tdx: " fmt
+
+#include <linux/types.h>
+#include <linux/cache.h>
+#include <linux/init.h>
+#include <linux/printk.h>
+#include <asm/msr.h>
+#include <asm/tdx.h>
+#include "tdx.h"
+
+static u32 tdx_keyid_start __ro_after_init;
+static u32 nr_tdx_keyids __ro_after_init;
+
+static int __init record_keyid_partitioning(void)
+{
+       u32 nr_mktme_keyids;
+       int ret;
+
+       /*
+        * IA32_MKTME_KEYID_PARTIONING:
+        *   Bit [31:0]:        Number of MKTME KeyIDs.
+        *   Bit [63:32]:       Number of TDX private KeyIDs.
+        */
+       ret = rdmsr_safe(MSR_IA32_MKTME_KEYID_PARTITIONING, &nr_mktme_keyids,
+                       &nr_tdx_keyids);
+       if (ret)
+               return -ENODEV;
+
+       if (!nr_tdx_keyids)
+               return -ENODEV;
+
+       /* TDX KeyIDs start after the last MKTME KeyID. */
+       tdx_keyid_start++;
+
+       pr_info("enabled: private KeyID range [%u, %u)\n",
+                       tdx_keyid_start, tdx_keyid_start + nr_tdx_keyids);
+
+       return 0;
+}
+
+static void __init clear_tdx(void)
+{
+       /*
+        * tdx_keyid_start and nr_tdx_keyids indicate that TDX is not
+        * enabled by the BIOS.  This is used in TDX boot-time
+        * initializatiton error paths to take it from enabled to not
+        * enabled.
+        */
+       tdx_keyid_start = nr_tdx_keyids = 0;
+}
+
+static int __init tdx_init(void)
+{
+       int err;
+
+       err = record_keyid_partitioning();
+       if (err)
+               return err;
+
+       /*
+        * Initializing the TDX module requires one TDX private KeyID.
+        * If there's only one TDX KeyID then after module initialization
+        * KVM won't be able to run any TDX guest, which makes the whole
+        * thing worthless.  Just disable TDX in this case.
+        */
+       if (nr_tdx_keyids < 2) {
+               pr_info("initialization failed: too few private KeyIDs available
(%d).\n",
+                               nr_tdx_keyids);
+               goto no_tdx;
+       }
+
+       return 0;
+no_tdx:
+       clear_tdx();
+       return -ENODEV;
+}
+early_initcall(tdx_init);
+
+/* Return whether the BIOS has enabled TDX */
+bool platform_tdx_enabled(void)
+{
+       return !!nr_tdx_keyids;
+}
diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h
new file mode 100644
index 000000000000..d00074abcb20
--- /dev/null
+++ b/arch/x86/virt/vmx/tdx/tdx.h
@@ -0,0 +1,15 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _X86_VIRT_TDX_H
+#define _X86_VIRT_TDX_H
+
+/*
+ * This file contains both macros and data structures defined by the TDX
+ * architecture and Linux defined software data structures and functions.
+ * The two should not be mixed together for better readability.  The
+ * architectural definitions come first.
+ */
+
+/* MSR to report KeyID partitioning between MKTME and TDX */
+#define MSR_IA32_MKTME_KEYID_PARTITIONING      0x00000087
+
+#endif
Dave Hansen Nov. 22, 2022, 4:50 p.m. UTC | #6
On 11/22/22 03:28, Huang, Kai wrote:
>>> +	/*
>>> +	 * KeyID 0 is for TME.  MKTME KeyIDs start from 1.  TDX private
>>> +	 * KeyIDs start after the last MKTME KeyID.
>>> +	 */
>>
>> Is the TME key a "MKTME KeyID"?
> 
> I don't think so.  Hardware handles TME KeyID 0 differently from non-0 MKTME
> KeyIDs.  And PCONFIG only accept non-0 KeyIDs.

Let's say we have 4 MKTME hardware bits, we'd have:

   0: TME Key
1->3: MKTME Keys
4->7: TDX Private Keys

First, the MSR values:

> +        * IA32_MKTME_KEYID_PARTIONING:
> +        *   Bit [31:0]:        Number of MKTME KeyIDs.
> +        *   Bit [63:32]:       Number of TDX private KeyIDs.

These would be:

	Bit [ 31:0] = 3
	Bit [63:22] = 4

And in the end the variables:

	tdx_keyid_start would be 4 and tdx_keyid_num would be 4.

Right?

That's a bit wonky for my brain because I guess I know too much about
the internal implementation and how the key space is split up.  I guess
I (wrongly) expected Bit[31:0]==Bit[63:22].



>>> +static void __init clear_tdx(void)
>>> +{
>>> +	tdx_keyid_start = tdx_keyid_num = 0;
>>> +}
>>
>> This is where a comment is needed and can actually help.
>>
>> /*
>>  * tdx_keyid_start/num indicate that TDX is uninitialized.  This
>>  * is used in TDX initialization error paths to take it from
>>  * initialized -> uninitialized.
>>  */
> 
> Just want to point out after removing the !x2apic_enabled() check, the only
> thing need to do here is to detect/record the TDX KeyIDs.
> 
> And the purpose of this TDX boot-time initialization code is to provide
> platform_tdx_enabled() function so that kexec() can use.
> 
> To distinguish boot-time TDX initialization from runtime TDX module
> initialization, how about change the comment to below?
> 
> static void __init clear_tdx(void)
> {
>         /*
>          * tdx_keyid_start and nr_tdx_keyids indicate that TDX is not
>          * enabled by the BIOS.  This is used in TDX boot-time
>          * initializatiton error paths to take it from enabled to not
>          * enabled.
>          */
>         tdx_keyid_start = nr_tdx_keyids = 0;
> }
> 
> [...]

I honestly have no idea what "boot-time TDX initialization" is versus
"runtime TDX module initialization".  This doesn't hel.

> And below is the updated patch.  How does it look to you?

Let's see...

...
> +static u32 tdx_keyid_start __ro_after_init;
> +static u32 nr_tdx_keyids __ro_after_init;
> +
> +static int __init record_keyid_partitioning(void)
> +{
> +       u32 nr_mktme_keyids;
> +       int ret;
> +
> +       /*
> +        * IA32_MKTME_KEYID_PARTIONING:
> +        *   Bit [31:0]:        Number of MKTME KeyIDs.
> +        *   Bit [63:32]:       Number of TDX private KeyIDs.
> +        */
> +       ret = rdmsr_safe(MSR_IA32_MKTME_KEYID_PARTITIONING, &nr_mktme_keyids,
> +                       &nr_tdx_keyids);
> +       if (ret)
> +               return -ENODEV;
> +
> +       if (!nr_tdx_keyids)
> +               return -ENODEV;
> +
> +       /* TDX KeyIDs start after the last MKTME KeyID. */
> +       tdx_keyid_start++;

tdx_keyid_start is uniniitalized here.  So, it'd be 0, then ++'d.

Kai, please take a moment and slow down.  This isn't a race.  I offered
some replacement code here, which you've discarded, missed or ignored
and in the process broken this code.

This approach just wastes reviewer time.  It's not working for me.

I'm going to make a suggestion (aka. a demand): You can post these
patches at most once a week.  You get a whole week to (carefully)
incorporate reviewer feedback, make the patch better, and post a new
version.  Need more time?  Go ahead and take it.  Take as much time as
you want.
Huang, Kai Nov. 22, 2022, 11:21 p.m. UTC | #7
On Tue, 2022-11-22 at 08:50 -0800, Dave Hansen wrote:
> On 11/22/22 03:28, Huang, Kai wrote:
> > > > +	/*
> > > > +	 * KeyID 0 is for TME.  MKTME KeyIDs start from 1.  TDX private
> > > > +	 * KeyIDs start after the last MKTME KeyID.
> > > > +	 */
> > > 
> > > Is the TME key a "MKTME KeyID"?
> > 
> > I don't think so.  Hardware handles TME KeyID 0 differently from non-0 MKTME
> > KeyIDs.  And PCONFIG only accept non-0 KeyIDs.
> 
> Let's say we have 4 MKTME hardware bits, we'd have:
> 
>    0: TME Key
> 1->3: MKTME Keys
> 4->7: TDX Private Keys
> 
> First, the MSR values:
> 
> > +        * IA32_MKTME_KEYID_PARTIONING:
> > +        *   Bit [31:0]:        Number of MKTME KeyIDs.
> > +        *   Bit [63:32]:       Number of TDX private KeyIDs.
> 
> These would be:
> 
> 	Bit [ 31:0] = 3
> 	Bit [63:22] = 4
> 
> And in the end the variables:
> 
> 	tdx_keyid_start would be 4 and tdx_keyid_num would be 4.
> 
> Right?

Yes.

> 
> That's a bit wonky for my brain because I guess I know too much about
> the internal implementation and how the key space is split up.  I guess
> I (wrongly) expected Bit[31:0]==Bit[63:22].

The spec says the The Bit[31:0] only reports the number of MKTME KeyIDs, and it
does exclude KeyID 0.

My machine has 6 hardware bits in total (that is KeyID 0 ~ 63), and the upper 48
KeyIDs are reserved to TDX.  In my case:

	[Bit 31:0] = 15
	[Bit 63:32] = 48

And tdx_keyid_start and nr_tdx_keyids are 16 and 48.

The TDX KeyID range: [16, 63], or [16, 64).

So [Bit 31:0] reports only "NUM_MKTME_KIDS", which excludes KeyID 0.

> 
> 
> 
> > > > +static void __init clear_tdx(void)
> > > > +{
> > > > +	tdx_keyid_start = tdx_keyid_num = 0;
> > > > +}
> > > 
> > > This is where a comment is needed and can actually help.
> > > 
> > > /*
> > >  * tdx_keyid_start/num indicate that TDX is uninitialized.  This
> > >  * is used in TDX initialization error paths to take it from
> > >  * initialized -> uninitialized.
> > >  */
> > 
> > Just want to point out after removing the !x2apic_enabled() check, the only
> > thing need to do here is to detect/record the TDX KeyIDs.
> > 
> > And the purpose of this TDX boot-time initialization code is to provide
> > platform_tdx_enabled() function so that kexec() can use.
> > 
> > To distinguish boot-time TDX initialization from runtime TDX module
> > initialization, how about change the comment to below?
> > 
> > static void __init clear_tdx(void)
> > {
> >         /*
> >          * tdx_keyid_start and nr_tdx_keyids indicate that TDX is not
> >          * enabled by the BIOS.  This is used in TDX boot-time
> >          * initializatiton error paths to take it from enabled to not
> >          * enabled.
> >          */
> >         tdx_keyid_start = nr_tdx_keyids = 0;
> > }
> > 
> > [...]
> 
> I honestly have no idea what "boot-time TDX initialization" is versus
> "runtime TDX module initialization".  This doesn't hel.

I'll use your original comment.

> 
> > And below is the updated patch.  How does it look to you?
> 
> Let's see...
> 
> ...
> > +static u32 tdx_keyid_start __ro_after_init;
> > +static u32 nr_tdx_keyids __ro_after_init;
> > +
> > +static int __init record_keyid_partitioning(void)
> > +{
> > +       u32 nr_mktme_keyids;
> > +       int ret;
> > +
> > +       /*
> > +        * IA32_MKTME_KEYID_PARTIONING:
> > +        *   Bit [31:0]:        Number of MKTME KeyIDs.
> > +        *   Bit [63:32]:       Number of TDX private KeyIDs.
> > +        */
> > +       ret = rdmsr_safe(MSR_IA32_MKTME_KEYID_PARTITIONING, &nr_mktme_keyids,
> > +                       &nr_tdx_keyids);
> > +       if (ret)
> > +               return -ENODEV;
> > +
> > +       if (!nr_tdx_keyids)
> > +               return -ENODEV;
> > +
> > +       /* TDX KeyIDs start after the last MKTME KeyID. */
> > +       tdx_keyid_start++;
> 
> tdx_keyid_start is uniniitalized here.  So, it'd be 0, then ++'d.
> 
> Kai, please take a moment and slow down.  This isn't a race.  I offered
> some replacement code here, which you've discarded, missed or ignored
> and in the process broken this code.
> 
> This approach just wastes reviewer time.  It's not working for me.

Apology.  I missed it this time.

> 
> I'm going to make a suggestion (aka. a demand): You can post these
> patches at most once a week.  You get a whole week to (carefully)
> incorporate reviewer feedback, make the patch better, and post a new
> version.  Need more time?  Go ahead and take it.  Take as much time as
> you want.
> 

Yes will follow.
diff mbox series

Patch

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 67745ceab0db..cced4ef3bfb2 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1953,6 +1953,18 @@  config X86_SGX
 
 	  If unsure, say N.
 
+config INTEL_TDX_HOST
+	bool "Intel Trust Domain Extensions (TDX) host support"
+	depends on CPU_SUP_INTEL
+	depends on X86_64
+	depends on KVM_INTEL
+	help
+	  Intel Trust Domain Extensions (TDX) protects guest VMs from malicious
+	  host and certain physical attacks.  This option enables necessary TDX
+	  support in host kernel to run protected VMs.
+
+	  If unsure, say N.
+
 config EFI
 	bool "EFI runtime service support"
 	depends on ACPI
diff --git a/arch/x86/Makefile b/arch/x86/Makefile
index 415a5d138de4..38d3e8addc5f 100644
--- a/arch/x86/Makefile
+++ b/arch/x86/Makefile
@@ -246,6 +246,8 @@  archheaders:
 
 libs-y  += arch/x86/lib/
 
+core-y += arch/x86/virt/
+
 # drivers-y are linked after core-y
 drivers-$(CONFIG_MATH_EMULATION) += arch/x86/math-emu/
 drivers-$(CONFIG_PCI)            += arch/x86/pci/
diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h
index e9a3f4a6fba1..51c4222a13ae 100644
--- a/arch/x86/include/asm/tdx.h
+++ b/arch/x86/include/asm/tdx.h
@@ -98,5 +98,12 @@  static inline long tdx_kvm_hypercall(unsigned int nr, unsigned long p1,
 	return -ENODEV;
 }
 #endif /* CONFIG_INTEL_TDX_GUEST && CONFIG_KVM_GUEST */
+
+#ifdef CONFIG_INTEL_TDX_HOST
+bool platform_tdx_enabled(void);
+#else	/* !CONFIG_INTEL_TDX_HOST */
+static inline bool platform_tdx_enabled(void) { return false; }
+#endif	/* CONFIG_INTEL_TDX_HOST */
+
 #endif /* !__ASSEMBLY__ */
 #endif /* _ASM_X86_TDX_H */
diff --git a/arch/x86/virt/Makefile b/arch/x86/virt/Makefile
new file mode 100644
index 000000000000..1e36502cd738
--- /dev/null
+++ b/arch/x86/virt/Makefile
@@ -0,0 +1,2 @@ 
+# SPDX-License-Identifier: GPL-2.0-only
+obj-y	+= vmx/
diff --git a/arch/x86/virt/vmx/Makefile b/arch/x86/virt/vmx/Makefile
new file mode 100644
index 000000000000..feebda21d793
--- /dev/null
+++ b/arch/x86/virt/vmx/Makefile
@@ -0,0 +1,2 @@ 
+# SPDX-License-Identifier: GPL-2.0-only
+obj-$(CONFIG_INTEL_TDX_HOST)	+= tdx/
diff --git a/arch/x86/virt/vmx/tdx/Makefile b/arch/x86/virt/vmx/tdx/Makefile
new file mode 100644
index 000000000000..93ca8b73e1f1
--- /dev/null
+++ b/arch/x86/virt/vmx/tdx/Makefile
@@ -0,0 +1,2 @@ 
+# SPDX-License-Identifier: GPL-2.0-only
+obj-y += tdx.o
diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
new file mode 100644
index 000000000000..982d9c453b6b
--- /dev/null
+++ b/arch/x86/virt/vmx/tdx/tdx.c
@@ -0,0 +1,95 @@ 
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright(c) 2022 Intel Corporation.
+ *
+ * Intel Trusted Domain Extensions (TDX) support
+ */
+
+#define pr_fmt(fmt)	"tdx: " fmt
+
+#include <linux/types.h>
+#include <linux/init.h>
+#include <linux/printk.h>
+#include <asm/msr-index.h>
+#include <asm/msr.h>
+#include <asm/tdx.h>
+#include "tdx.h"
+
+static u32 tdx_keyid_start __ro_after_init;
+static u32 tdx_keyid_num __ro_after_init;
+
+/*
+ * Detect TDX private KeyIDs to see whether TDX has been enabled by the
+ * BIOS.  Both initializing the TDX module and running TDX guest require
+ * TDX private KeyID.
+ *
+ * TDX doesn't trust BIOS.  TDX verifies all configurations from BIOS
+ * are correct before enabling TDX on any core.  TDX requires the BIOS
+ * to correctly and consistently program TDX private KeyIDs on all CPU
+ * packages.  Unless there is a BIOS bug, detecting a valid TDX private
+ * KeyID range on BSP indicates TDX has been enabled by the BIOS.  If
+ * there's such BIOS bug, it will be caught later when initializing the
+ * TDX module.
+ */
+static int __init detect_tdx(void)
+{
+	int ret;
+
+	/*
+	 * IA32_MKTME_KEYID_PARTIONING:
+	 *   Bit [31:0]:	Number of MKTME KeyIDs.
+	 *   Bit [63:32]:	Number of TDX private KeyIDs.
+	 */
+	ret = rdmsr_safe(MSR_IA32_MKTME_KEYID_PARTITIONING, &tdx_keyid_start,
+			&tdx_keyid_num);
+	if (ret)
+		return -ENODEV;
+
+	if (!tdx_keyid_num)
+		return -ENODEV;
+
+	/*
+	 * KeyID 0 is for TME.  MKTME KeyIDs start from 1.  TDX private
+	 * KeyIDs start after the last MKTME KeyID.
+	 */
+	tdx_keyid_start++;
+
+	pr_info("TDX enabled by BIOS. TDX private KeyID range: [%u, %u)\n",
+			tdx_keyid_start, tdx_keyid_start + tdx_keyid_num);
+
+	return 0;
+}
+
+static void __init clear_tdx(void)
+{
+	tdx_keyid_start = tdx_keyid_num = 0;
+}
+
+static int __init tdx_init(void)
+{
+	if (detect_tdx())
+		return -ENODEV;
+
+	/*
+	 * Initializing the TDX module requires one TDX private KeyID.
+	 * If there's only one TDX KeyID then after module initialization
+	 * KVM won't be able to run any TDX guest, which makes the whole
+	 * thing worthless.  Just disable TDX in this case.
+	 */
+	if (tdx_keyid_num < 2) {
+		pr_info("Disable TDX as there's only one TDX private KeyID available.\n");
+		goto no_tdx;
+	}
+
+	return 0;
+no_tdx:
+	clear_tdx();
+	return -ENODEV;
+}
+early_initcall(tdx_init);
+
+/* Return whether the BIOS has enabled TDX */
+bool platform_tdx_enabled(void)
+{
+	return !!tdx_keyid_num;
+}
diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h
new file mode 100644
index 000000000000..d00074abcb20
--- /dev/null
+++ b/arch/x86/virt/vmx/tdx/tdx.h
@@ -0,0 +1,15 @@ 
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _X86_VIRT_TDX_H
+#define _X86_VIRT_TDX_H
+
+/*
+ * This file contains both macros and data structures defined by the TDX
+ * architecture and Linux defined software data structures and functions.
+ * The two should not be mixed together for better readability.  The
+ * architectural definitions come first.
+ */
+
+/* MSR to report KeyID partitioning between MKTME and TDX */
+#define MSR_IA32_MKTME_KEYID_PARTITIONING	0x00000087
+
+#endif