diff mbox series

[3/3] KVM: VMX: Initialize TDX during KVM module load

Message ID f7394b88a22e52774f23854950d45c1bfeafe42c.1730120881.git.kai.huang@intel.com (mailing list archive)
State New
Headers show
Series KVM: VMX: Initialize TDX when loading KVM module | expand

Commit Message

Huang, Kai Oct. 28, 2024, 1:20 p.m. UTC
Before KVM can use TDX to create and run TDX guests, TDX needs to be
initialized from two perspectives: 1) TDX module must be initialized
properly to a working state; 2) A per-cpu TDX initialization, a.k.a the
TDH.SYS.LP.INIT SEAMCALL must be done on any logical cpu before it can
run any other TDX SEAMCALLs.

The TDX host core-kernel provides two functions to do above two
respectively: tdx_enable() and tdx_cpu_enable().

There are two options in terms of when to initialize TDX: initialize TDX
at KVM module loading time, or when creating the first TDX guest.

Choose to initialize TDX during KVM module loading time:

Initializing TDX module is both memory and CPU time consuming: 1) the
kernel needs to allocate a non-trivial size(~1/256) of system memory
as metadata used by TDX module to track each TDX-usable memory page's
status; 2) the TDX module needs to initialize this metadata, one entry
for each TDX-usable memory page.

Also, the kernel uses alloc_contig_pages() to allocate those metadata
chunks, because they are large and need to be physically contiguous.

alloc_contig_pages() can fail.  If initializing TDX when creating the
first TDX guest, then there's chance that KVM won't be able to run any
TDX guests albeit KVM _declares_ to be able to support TDX.

This isn't good to the user.  On the other hand, initializing TDX at KVM
module loading time can make sure KVM is providing a consistent view of
whether KVM can support TDX to the user.

Always only try to initialize TDX after VMX has been initialized.  TDX
is based on VMX, and if VMX fails to initialize then TDX is likely to be
broken anyway.  Also, in practice, supporting TDX will require part of
VMX and common x86 infrastructure in working order, so TDX cannot be
sololy w/o VMX support.

Specifically, initialize TDX after VMX has been initialized and before
kvm_init() in vt_init().  Don't fail the whole vt_init() if TDX fails to
initialize, since in this case KVM can still support normal VMX guests.

Because TDX costs additional memory, don't enable TDX by default.  Add a
new module parameter 'enable_tdx' to allow the user to opt-in.

Register a new TDX-specific cpuhp callback to run tdx_cpu_enable(), and
call tdx_enable() after that to ensure tdx_cpu_enable() has been done
for all online CPUs before making the tdx_enable().  Use a dynamic cpuhp
state for TDX so that KVM's cpuhp callback to enable VMX on new online
CPU can happen before tdx_cpu_enable().

Note, the name tdx_init() has already been taken by the early boot code.
Use tdx_bringup() for initializing TDX (and tdx_cleanup() since KVM
doesn't actually teardown TDX).  They don't match vt_init()/vt_exit(),
vmx_init()/vmx_exit() etc but it's not end of the world.

Also, once initialized, the TDX module cannot be disabled and enabled
again w/o the TDX module runtime update, which isn't supported by the
kernel.  After TDX is enabled, nothing needs to be done when KVM
disables hardware virtualization, e.g., when offlining CPU, or during
suspend/resume.

Signed-off-by: Kai Huang <kai.huang@intel.com>
---
 arch/x86/kvm/Makefile   |   1 +
 arch/x86/kvm/vmx/main.c |   6 +++
 arch/x86/kvm/vmx/tdx.c  | 115 ++++++++++++++++++++++++++++++++++++++++
 arch/x86/kvm/vmx/tdx.h  |  12 +++++
 4 files changed, 134 insertions(+)
 create mode 100644 arch/x86/kvm/vmx/tdx.c
 create mode 100644 arch/x86/kvm/vmx/tdx.h

Comments

Sean Christopherson Oct. 30, 2024, 3:19 p.m. UTC | #1
On Tue, Oct 29, 2024, Kai Huang wrote:
> diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
> index f9dddb8cb466..fec803aff7ad 100644
> --- a/arch/x86/kvm/Makefile
> +++ b/arch/x86/kvm/Makefile
> @@ -20,6 +20,7 @@ kvm-intel-y		+= vmx/vmx.o vmx/vmenter.o vmx/pmu_intel.o vmx/vmcs12.o \
>  
>  kvm-intel-$(CONFIG_X86_SGX_KVM)	+= vmx/sgx.o
>  kvm-intel-$(CONFIG_KVM_HYPERV)	+= vmx/hyperv.o vmx/hyperv_evmcs.o
> +kvm-intel-$(CONFIG_INTEL_TDX_HOST)	+= vmx/tdx.o

IMO, INTEL_TDX_HOST should be a KVM Kconfig, e.g. KVM_INTEL_TDX.  Forcing the user
to bounce between KVM's menu and the generic menu to enable KVM support for TDX is
kludgy.  Having INTEL_TDX_HOST exist before KVM support came along made sense, as
it allowed compile-testing a bunch of code, but I don't think it should be the end
state.

If others disagree, then we should adjust KVM_AMD_SEV in the opposite direction,
because doing different things for SEV vs. TDX is confusing and messy.

>  kvm-amd-y		+= svm/svm.o svm/vmenter.o svm/pmu.o svm/nested.o svm/avic.o
>  
> diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
> index 433ecbd90905..053294939eb1 100644
> --- a/arch/x86/kvm/vmx/main.c
> +++ b/arch/x86/kvm/vmx/main.c
> @@ -6,6 +6,7 @@
>  #include "nested.h"
>  #include "pmu.h"
>  #include "posted_intr.h"
> +#include "tdx.h"
>  
>  #define VMX_REQUIRED_APICV_INHIBITS				\
>  	(BIT(APICV_INHIBIT_REASON_DISABLED) |			\
> @@ -170,6 +171,7 @@ struct kvm_x86_init_ops vt_init_ops __initdata = {
>  static void vt_exit(void)
>  {
>  	kvm_exit();
> +	tdx_cleanup();
>  	vmx_exit();
>  }
>  module_exit(vt_exit);
> @@ -182,6 +184,9 @@ static int __init vt_init(void)
>  	if (r)
>  		return r;
>  
> +	/* tdx_init() has been taken */
> +	tdx_bringup();

tdx_module_init()?  And honestly, even though Linux doesn't currently support
unloading the TDX module, I think tdx_module_exit() is a perfectly fine name,
because not being able to unload the TDX module and reclaim all of that memory
is a flaw that should be addressed at some point.
> +static enum cpuhp_state tdx_cpuhp_state;
> +
> +static int tdx_online_cpu(unsigned int cpu)
> +{
> +	unsigned long flags;
> +	int r;
> +
> +	/* Sanity check CPU is already in post-VMXON */
> +	WARN_ON_ONCE(!(cr4_read_shadow() & X86_CR4_VMXE));
> +
> +	/* tdx_cpu_enable() must be called with IRQ disabled */

I don't find this comment helpfu.  If it explained _why_ tdx_cpu_enable() requires
IRQs to be disabled, then I'd feel differently, but as is, IMO it doesn't add value.

> +	local_irq_save(flags);
> +	r = tdx_cpu_enable();
> +	local_irq_restore(flags);
> +
> +	return r;
> +}
> +

...

> +static int __init __do_tdx_bringup(void)
> +{
> +	int r;
> +
> +	/*
> +	 * TDX-specific cpuhp callback to call tdx_cpu_enable() on all
> +	 * online CPUs before calling tdx_enable(), and on any new
> +	 * going-online CPU to make sure it is ready for TDX guest.
> +	 */
> +	r = cpuhp_setup_state_cpuslocked(CPUHP_AP_ONLINE_DYN,
> +					 "kvm/cpu/tdx:online",
> +					 tdx_online_cpu, NULL);
> +	if (r < 0)
> +		return r;
> +
> +	tdx_cpuhp_state = r;
> +
> +	/* tdx_enable() must be called with cpus_read_lock() */

This comment is even less valuable, IMO.

> +	r = tdx_enable();
> +	if (r)
> +		__do_tdx_cleanup();
> +
> +	return r;
> +}
> +
> +static int __init __tdx_bringup(void)
> +{
> +	int r;
> +
> +	if (!enable_ept) {
> +		pr_err("Cannot enable TDX with EPT disabled.\n");

Why wait until now to check for EPT?  Force enable_tdx to false if enable_ept is
false, don't fail the module load.

> +		return -EINVAL;
> +	}
> +
> +	/*
> +	 * Enabling TDX requires enabling hardware virtualization first,
> +	 * as making SEAMCALLs requires CPU being in post-VMXON state.
> +	 */
> +	r = kvm_enable_virtualization();
> +	if (r)
> +		return r;
> +
> +	cpus_read_lock();
> +	r = __do_tdx_bringup();
> +	cpus_read_unlock();
> +
> +	if (r)
> +		goto tdx_bringup_err;
> +
> +	/*
> +	 * Leave hardware virtualization enabled after TDX is enabled
> +	 * successfully.  TDX CPU hotplug depends on this.
> +	 */
> +	return 0;
> +tdx_bringup_err:
> +	kvm_disable_virtualization();
> +	return r;
> +}
> +
> +void tdx_cleanup(void)
> +{
> +	if (enable_tdx) {
> +		__do_tdx_cleanup();
> +		kvm_disable_virtualization();
> +	}
> +}
> +
> +void __init tdx_bringup(void)
> +{
> +	enable_tdx = enable_tdx && !__tdx_bringup();

Ah.  I don't love this approach because it mixes "failure" due to an unsupported
configuration, with failure due to unexpected issues.  E.g. if enabling virtualization
fails, loading KVM-the-module absolutely should fail too, not simply disable TDX.
Huang, Kai Oct. 31, 2024, 11:17 a.m. UTC | #2
On Wed, 2024-10-30 at 08:19 -0700, Sean Christopherson wrote:
> On Tue, Oct 29, 2024, Kai Huang wrote:
> > diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
> > index f9dddb8cb466..fec803aff7ad 100644
> > --- a/arch/x86/kvm/Makefile
> > +++ b/arch/x86/kvm/Makefile
> > @@ -20,6 +20,7 @@ kvm-intel-y		+= vmx/vmx.o vmx/vmenter.o vmx/pmu_intel.o vmx/vmcs12.o \
> >  
> >  kvm-intel-$(CONFIG_X86_SGX_KVM)	+= vmx/sgx.o
> >  kvm-intel-$(CONFIG_KVM_HYPERV)	+= vmx/hyperv.o vmx/hyperv_evmcs.o
> > +kvm-intel-$(CONFIG_INTEL_TDX_HOST)	+= vmx/tdx.o
> 
> IMO, INTEL_TDX_HOST should be a KVM Kconfig, e.g. KVM_INTEL_TDX.  Forcing the user
> to bounce between KVM's menu and the generic menu to enable KVM support for TDX is
> kludgy.  Having INTEL_TDX_HOST exist before KVM support came along made sense, as
> it allowed compile-testing a bunch of code, but I don't think it should be the end
> state.
> 
> If others disagree, then we should adjust KVM_AMD_SEV in the opposite direction,
> because doing different things for SEV vs. TDX is confusing and messy.

+ Dave (and Dan for TDX Connect).

Agree SEV/TDX should be in similar way.  But also I find SEV has a dependency on
CRYPTO_DEV_SP_PSP, so perhaps it also reasonable to make an additional
KVM_INTEL_TDX and make it depend on INTEL_TDX_HOST?

We could remove INTEL_TDX_HOST but only keep KVM_INTEL_TDX.  But in the long
term, more kernel components will need to add TDX support (e.g., for TDX
Connect).  I think the question is whether we can safely disable TDX code in ALL
kernel components when KVM_INTEL_TDX is not enabled.

If the answer is yes (seems correct to me, because it seems meaningless to
enable TDX code in _ANY_ kernel components when it's even possible to run TDX 
guest), then I think we can just change the current INTEL_TDX_HOST to
KVM_INTEL_TDX and put it in arch/x86/kvm/Kconfig.

Hi Dave, Dan,

Do you have any comments?

> 
> >  kvm-amd-y		+= svm/svm.o svm/vmenter.o svm/pmu.o svm/nested.o svm/avic.o
> >  
> > diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
> > index 433ecbd90905..053294939eb1 100644
> > --- a/arch/x86/kvm/vmx/main.c
> > +++ b/arch/x86/kvm/vmx/main.c
> > @@ -6,6 +6,7 @@
> >  #include "nested.h"
> >  #include "pmu.h"
> >  #include "posted_intr.h"
> > +#include "tdx.h"
> >  
> >  #define VMX_REQUIRED_APICV_INHIBITS				\
> >  	(BIT(APICV_INHIBIT_REASON_DISABLED) |			\
> > @@ -170,6 +171,7 @@ struct kvm_x86_init_ops vt_init_ops __initdata = {
> >  static void vt_exit(void)
> >  {
> >  	kvm_exit();
> > +	tdx_cleanup();
> >  	vmx_exit();
> >  }
> >  module_exit(vt_exit);
> > @@ -182,6 +184,9 @@ static int __init vt_init(void)
> >  	if (r)
> >  		return r;
> >  
> > +	/* tdx_init() has been taken */
> > +	tdx_bringup();
> 
> tdx_module_init()?  And honestly, even though Linux doesn't currently support
> unloading the TDX module, I think tdx_module_exit() is a perfectly fine name,
> because not being able to unload the TDX module and reclaim all of that memory
> is a flaw that should be addressed at some point.

tdx_module_init()/exit() also work for me.

Or is vt_tdx_init()/exit() better?  We can rename vmx_init()/exit() to
vt_vmx_init()/exit() if needed.

> > +static enum cpuhp_state tdx_cpuhp_state;
> > +
> > +static int tdx_online_cpu(unsigned int cpu)
> > +{
> > +	unsigned long flags;
> > +	int r;
> > +
> > +	/* Sanity check CPU is already in post-VMXON */
> > +	WARN_ON_ONCE(!(cr4_read_shadow() & X86_CR4_VMXE));
> > +
> > +	/* tdx_cpu_enable() must be called with IRQ disabled */
> 
> I don't find this comment helpfu.  If it explained _why_ tdx_cpu_enable() requires
> IRQs to be disabled, then I'd feel differently, but as is, IMO it doesn't add value.

I'll remove the comment.

> 
> > +	local_irq_save(flags);
> > +	r = tdx_cpu_enable();
> > +	local_irq_restore(flags);
> > +
> > +	return r;
> > +}
> > +
> 
> ...
> 
> > +static int __init __do_tdx_bringup(void)
> > +{
> > +	int r;
> > +
> > +	/*
> > +	 * TDX-specific cpuhp callback to call tdx_cpu_enable() on all
> > +	 * online CPUs before calling tdx_enable(), and on any new
> > +	 * going-online CPU to make sure it is ready for TDX guest.
> > +	 */
> > +	r = cpuhp_setup_state_cpuslocked(CPUHP_AP_ONLINE_DYN,
> > +					 "kvm/cpu/tdx:online",
> > +					 tdx_online_cpu, NULL);
> > +	if (r < 0)
> > +		return r;
> > +
> > +	tdx_cpuhp_state = r;
> > +
> > +	/* tdx_enable() must be called with cpus_read_lock() */
> 
> This comment is even less valuable, IMO.

Will remove.

> 
> > +	r = tdx_enable();
> > +	if (r)
> > +		__do_tdx_cleanup();
> > +
> > +	return r;
> > +}
> > +
> > 

[...]

> > +void __init tdx_bringup(void)
> > +{
> > +	enable_tdx = enable_tdx && !__tdx_bringup();
> 
> Ah.  I don't love this approach because it mixes "failure" due to an unsupported
> configuration, with failure due to unexpected issues.  E.g. if enabling virtualization
> fails, loading KVM-the-module absolutely should fail too, not simply disable TDX.

Thanks for the comments.

I see your point.  However for "enabling virtualization failure" kvm_init() will
also try to do (default behaviour), so if it fails it will result in module
loading failure eventually.  So while I guess it would be slightly better to
make module loading fail if "enabling virtualization fails" in TDX, it is a nit
issue to me.

I think "enabling virtualization failure" is the only "unexpected issue" that
should result in module loading failure.  For any other TDX-specific
initialization failure (e.g., any memory allocation in future patches) it's
better to only disable TDX.

So I can change to "make loading KVM-the-module fail if enabling virtualization
fails in TDX", but I want to confirm this is what you want?
Sean Christopherson Oct. 31, 2024, 8:22 p.m. UTC | #3
On Thu, Oct 31, 2024, Kai Huang wrote:
> On Wed, 2024-10-30 at 08:19 -0700, Sean Christopherson wrote:
> > > +void __init tdx_bringup(void)
> > > +{
> > > +	enable_tdx = enable_tdx && !__tdx_bringup();
> > 
> > Ah.  I don't love this approach because it mixes "failure" due to an unsupported
> > configuration, with failure due to unexpected issues.  E.g. if enabling virtualization
> > fails, loading KVM-the-module absolutely should fail too, not simply disable TDX.
> 
> Thanks for the comments.
> 
> I see your point.  However for "enabling virtualization failure" kvm_init() will
> also try to do (default behaviour), so if it fails it will result in module
> loading failure eventually.  So while I guess it would be slightly better to
> make module loading fail if "enabling virtualization fails" in TDX, it is a nit
> issue to me.
> 
> I think "enabling virtualization failure" is the only "unexpected issue" that
> should result in module loading failure.  For any other TDX-specific
> initialization failure (e.g., any memory allocation in future patches) it's
> better to only disable TDX.

I disagree.  The platform owner wants TDX to be enabled, KVM shouldn't silently
disable TDX because of a transient, unrelated failure.

If TDX _can't_ be supported, e.g. because EPT or MMIO SPTE caching was explicitly
disable, then that's different.  And that's the general pattern throughout KVM.
If a requested feature isn't supported, then KVM continues on updates the module
param accordingly.  But if something outright fails during setup, KVM aborts the
entire sequence.

> So I can change to "make loading KVM-the-module fail if enabling virtualization
> fails in TDX", but I want to confirm this is what you want?

I would prefer the logic to be: reject loading kvm-intel.ko if an operation that
would normally succeed, fails.
Huang, Kai Oct. 31, 2024, 9:21 p.m. UTC | #4
On 1/11/2024 9:22 am, Sean Christopherson wrote:
> On Thu, Oct 31, 2024, Kai Huang wrote:
>> On Wed, 2024-10-30 at 08:19 -0700, Sean Christopherson wrote:
>>>> +void __init tdx_bringup(void)
>>>> +{
>>>> +	enable_tdx = enable_tdx && !__tdx_bringup();
>>>
>>> Ah.  I don't love this approach because it mixes "failure" due to an unsupported
>>> configuration, with failure due to unexpected issues.  E.g. if enabling virtualization
>>> fails, loading KVM-the-module absolutely should fail too, not simply disable TDX.
>>
>> Thanks for the comments.
>>
>> I see your point.  However for "enabling virtualization failure" kvm_init() will
>> also try to do (default behaviour), so if it fails it will result in module
>> loading failure eventually.  So while I guess it would be slightly better to
>> make module loading fail if "enabling virtualization fails" in TDX, it is a nit
>> issue to me.
>>
>> I think "enabling virtualization failure" is the only "unexpected issue" that
>> should result in module loading failure.  For any other TDX-specific
>> initialization failure (e.g., any memory allocation in future patches) it's
>> better to only disable TDX.
> 
> I disagree.  The platform owner wants TDX to be enabled, KVM shouldn't silently
> disable TDX because of a transient, unrelated failure.
> 
> If TDX _can't_ be supported, e.g. because EPT or MMIO SPTE caching was explicitly
> disable, then that's different.  And that's the general pattern throughout KVM.
> If a requested feature isn't supported, then KVM continues on updates the module
> param accordingly.  But if something outright fails during setup, KVM aborts the
> entire sequence.
> 
>> So I can change to "make loading KVM-the-module fail if enabling virtualization
>> fails in TDX", but I want to confirm this is what you want?
> 
> I would prefer the logic to be: reject loading kvm-intel.ko if an operation that
> would normally succeed, fails.

OK will change to what you suggested.  I'll need to take a deeper look 
though since later patches will add more checks.

Thanks for the comments!
Edgecombe, Rick P Oct. 31, 2024, 9:29 p.m. UTC | #5
Paolo,

On Fri, 2024-11-01 at 10:21 +1300, Huang, Kai wrote:
> > 
> > I would prefer the logic to be: reject loading kvm-intel.ko if an operation
> > that
> > would normally succeed, fails.
> 
> OK will change to what you suggested.  I'll need to take a deeper look 
> though since later patches will add more checks.
> 
> Thanks for the comments!

We had talked about you taking this series over, under the assumption that it
was mostly settled. Would it help for us to spin another branch with these
changes?
Dan Williams Oct. 31, 2024, 9:52 p.m. UTC | #6
Huang, Kai wrote:
> On Wed, 2024-10-30 at 08:19 -0700, Sean Christopherson wrote:
> > On Tue, Oct 29, 2024, Kai Huang wrote:
> > > diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
> > > index f9dddb8cb466..fec803aff7ad 100644
> > > --- a/arch/x86/kvm/Makefile
> > > +++ b/arch/x86/kvm/Makefile
> > > @@ -20,6 +20,7 @@ kvm-intel-y		+= vmx/vmx.o vmx/vmenter.o vmx/pmu_intel.o vmx/vmcs12.o \
> > >  
> > >  kvm-intel-$(CONFIG_X86_SGX_KVM)	+= vmx/sgx.o
> > >  kvm-intel-$(CONFIG_KVM_HYPERV)	+= vmx/hyperv.o vmx/hyperv_evmcs.o
> > > +kvm-intel-$(CONFIG_INTEL_TDX_HOST)	+= vmx/tdx.o
> > 
> > IMO, INTEL_TDX_HOST should be a KVM Kconfig, e.g. KVM_INTEL_TDX.  Forcing the user
> > to bounce between KVM's menu and the generic menu to enable KVM support for TDX is
> > kludgy.  Having INTEL_TDX_HOST exist before KVM support came along made sense, as
> > it allowed compile-testing a bunch of code, but I don't think it should be the end
> > state.
> > 
> > If others disagree, then we should adjust KVM_AMD_SEV in the opposite direction,
> > because doing different things for SEV vs. TDX is confusing and messy.
> 
> + Dave (and Dan for TDX Connect).
> 
> Agree SEV/TDX should be in similar way.  But also I find SEV has a dependency on
> CRYPTO_DEV_SP_PSP, so perhaps it also reasonable to make an additional
> KVM_INTEL_TDX and make it depend on INTEL_TDX_HOST?
> 
> We could remove INTEL_TDX_HOST but only keep KVM_INTEL_TDX.  But in the long
> term, more kernel components will need to add TDX support (e.g., for TDX
> Connect).  I think the question is whether we can safely disable TDX code in ALL
> kernel components when KVM_INTEL_TDX is not enabled.
> 
> If the answer is yes (seems correct to me, because it seems meaningless to
> enable TDX code in _ANY_ kernel components when it's even possible to run TDX 
> guest), then I think we can just change the current INTEL_TDX_HOST to
> KVM_INTEL_TDX and put it in arch/x86/kvm/Kconfig.

I agree with Sean's later reply that kvm-intel.ko should fail if
anything that is expected to be there and not otherwise permanently
disabled fails setup.

However, I want to provide a counterpoint to this "_ANY_ kernel
component" dependency on being able to run a TDX guest. TDX Connect like
SEV-TIO offers device-security provisioning flows that are expected to
run before any confidential guest is being launched, and theoretically
may offer services independent of *ever* launching a guest (e.g. PCIe
link encrcyption without device assignment). So longer term, seamcalls
without kvm-intel.ko flexibility is useful, but in the near term a
coarse dependency on kvm-intel.ko is workable.
Huang, Kai Oct. 31, 2024, 10:37 p.m. UTC | #7
> However, I want to provide a counterpoint to this "_ANY_ kernel
> component" dependency on being able to run a TDX guest. TDX Connect like
> SEV-TIO offers device-security provisioning flows that are expected to
> run before any confidential guest is being launched, and theoretically
> may offer services independent of *ever* launching a guest (e.g. PCIe
> link encrcyption without device assignment). So longer term, seamcalls
> without kvm-intel.ko flexibility is useful, but in the near term a
> coarse dependency on kvm-intel.ko is workable.

Thanks for the info.

So it seems we should keep INTEL_TDX_HOST but add a new KVM_INTEL_TDX:

diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index f09f13c01c6b..bcf4a1243013 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -126,6 +126,16 @@ config X86_SGX_KVM

           If unsure, say N.

+config KVM_INTEL_TDX
+       bool "Intel Trust Domain Extensions (TDX) support"
+       default y
+       depends on INTEL_TDX_HOST
+       help
+         Provides support for launching Intel Trust Domain Extensions
+         (TDX) confidential VMs on Intel processors.
+
+         If unsure, say N.
+
  config KVM_AMD
         tristate "KVM for AMD processors support"
         depends on KVM && (CPU_SUP_AMD || CPU_SUP_HYGON)
diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
index fec803aff7ad..a5d362c7b504 100644
--- a/arch/x86/kvm/Makefile
+++ b/arch/x86/kvm/Makefile
@@ -20,7 +20,7 @@ kvm-intel-y           += vmx/vmx.o vmx/vmenter.o 
vmx/pmu_intel.o vmx/vmcs12.o \

  kvm-intel-$(CONFIG_X86_SGX_KVM)        += vmx/sgx.o
  kvm-intel-$(CONFIG_KVM_HYPERV) += vmx/hyperv.o vmx/hyperv_evmcs.o
-kvm-intel-$(CONFIG_INTEL_TDX_HOST)     += vmx/tdx.o
+kvm-intel-$(CONFIG_KVM_INTEL_TDX)      += vmx/tdx.o

  kvm-amd-y              += svm/svm.o svm/vmenter.o svm/pmu.o 
svm/nested.o svm/avic.o

One thing is currently INTEL_TDX_HOST depends on KVM_INTEL (with the 
reason that for now only KVM will use TDX), we can either remove this 
dependency together with the above diff, or we can have another patch in 
the future to remove that when TDX Connect comes near.

I think we can leave this part to the future.

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 007bab9f2a0e..acc5a14dfbbc 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1974,7 +1974,6 @@ config INTEL_TDX_HOST
         bool "Intel Trust Domain Extensions (TDX) host support"
         depends on CPU_SUP_INTEL
         depends on X86_64
-       depends on KVM_INTEL
         depends on X86_X2APIC
         select ARCH_KEEP_MEMBLOCK
         depends on CONTIG_ALLOC
Dan Williams Oct. 31, 2024, 10:56 p.m. UTC | #8
Huang, Kai wrote:
[..]
> One thing is currently INTEL_TDX_HOST depends on KVM_INTEL (with the 
> reason that for now only KVM will use TDX), we can either remove this 
> dependency together with the above diff, or we can have another patch in 
> the future to remove that when TDX Connect comes near.
> 
> I think we can leave this part to the future.

Agree, save that for later. There is latent effort to disconnect VTx
from KVM that TDX Connect is also eyeing, and I expect that work is what
allows INTEL_TDX_HOST to drop its KVM_INTEL dependency.
Huang, Kai Nov. 6, 2024, 10:49 a.m. UTC | #9
On Thu, 2024-10-31 at 13:22 -0700, Sean Christopherson wrote:
> On Thu, Oct 31, 2024, Kai Huang wrote:
> > On Wed, 2024-10-30 at 08:19 -0700, Sean Christopherson wrote:
> > > > +void __init tdx_bringup(void)
> > > > +{
> > > > +	enable_tdx = enable_tdx && !__tdx_bringup();
> > > 
> > > Ah.  I don't love this approach because it mixes "failure" due to an unsupported
> > > configuration, with failure due to unexpected issues.  E.g. if enabling virtualization
> > > fails, loading KVM-the-module absolutely should fail too, not simply disable TDX.
> > 
> > Thanks for the comments.
> > 
> > I see your point.  However for "enabling virtualization failure" kvm_init() will
> > also try to do (default behaviour), so if it fails it will result in module
> > loading failure eventually.  So while I guess it would be slightly better to
> > make module loading fail if "enabling virtualization fails" in TDX, it is a nit
> > issue to me.
> > 
> > I think "enabling virtualization failure" is the only "unexpected issue" that
> > should result in module loading failure.  For any other TDX-specific
> > initialization failure (e.g., any memory allocation in future patches) it's
> > better to only disable TDX.
> 
> I disagree.  The platform owner wants TDX to be enabled, KVM shouldn't silently
> disable TDX because of a transient, unrelated failure.
> 
> If TDX _can't_ be supported, e.g. because EPT or MMIO SPTE caching was explicitly
> disable, then that's different.  And that's the general pattern throughout KVM.
> If a requested feature isn't supported, then KVM continues on updates the module
> param accordingly.  But if something outright fails during setup, KVM aborts the
> entire sequence.
> 
> > So I can change to "make loading KVM-the-module fail if enabling virtualization
> > fails in TDX", but I want to confirm this is what you want?
> 
> I would prefer the logic to be: reject loading kvm-intel.ko if an operation that
> would normally succeed, fails.

I looked at the final tdx.c that in our development branch [*], and below is the
list of the things that need to be done to init TDX (the code in
__tdx_bringup()), and my thinking of whether to fail loading the module or just
disable TDX:

1) Early dependency check fails.  Those include: tdp_mmu_enabled,
enable_mmio_caching, X86_FEATURE_MOVDIR64B check and check the presence of
TSX_CTL uret MSR.

For those we can disable TDX only but continue to load module.

2) Enable virtualization fails.

For this we fail to load module (as you suggested).

3) Fail to register TDX cpuhp to do tdx_cpu_enable() and handle cpu hotplug.

For this we only disable TDX but continue to load module.  The reason is I think
this is similar to enable a specific KVM feature but the hardware doesn't
support it.  We can go further to check the return value of tdx_cpu_enable() to
distinguish cases like "module not loaded" and "unexpected error", but I really
don't want to go that far.

4) tdx_enable() fails.

Ditto to 3).

5) tdx_get_sysinfo() fails.

This is a kernel bug since tdx_get_sysinfo() should always return valid TDX
sysinfo structure pointer after tdx_enable() is done successfully.  Currently we
just WARN() if the returned pointer is NULL and disable TDX only.  I think it's
also fine.

6) TDX global metadata check fails, e.g., MAX_VCPUS etc.

Ditto to 3).  For this we disable TDX only.

To summarize, if you agree with above, then only "enable virtualization failure"
results in module loading failure.

And I am not sure whether it's worth to distinguish those cases.  For the
concern of "KVM shouldn't silently disable TDX because of a transient, unrelated
failure", we can explicitly print error message to tell user why TDX is
disabled.

Any comments?

[*]
https://github.com/intel/tdx/blob/tdx_kvm_dev-2024-10-25.1-host-metadata-v6-rebase/arch/x86/kvm/vmx/tdx.c
Edgecombe, Rick P Nov. 6, 2024, 2:19 p.m. UTC | #10
On Thu, 2024-10-31 at 14:29 -0700, Rick Edgecombe wrote:
> > Thanks for the comments!
> 
> We had talked about you taking this series over, under the assumption that it
> was mostly settled. Would it help for us to spin another branch with these
> changes?

Discussing on the PUCK call, Paolo said to send another version with the changes
incorporated.
Sean Christopherson Nov. 6, 2024, 3:01 p.m. UTC | #11
On Wed, Nov 06, 2024, Kai Huang wrote:
> On Thu, 2024-10-31 at 13:22 -0700, Sean Christopherson wrote:
> > On Thu, Oct 31, 2024, Kai Huang wrote:
> > > On Wed, 2024-10-30 at 08:19 -0700, Sean Christopherson wrote:
> > > > > +void __init tdx_bringup(void)
> > > > > +{
> > > > > +	enable_tdx = enable_tdx && !__tdx_bringup();
> > > > 
> > > > Ah.  I don't love this approach because it mixes "failure" due to an unsupported
> > > > configuration, with failure due to unexpected issues.  E.g. if enabling virtualization
> > > > fails, loading KVM-the-module absolutely should fail too, not simply disable TDX.
> > > 
> > > Thanks for the comments.
> > > 
> > > I see your point.  However for "enabling virtualization failure" kvm_init() will
> > > also try to do (default behaviour), so if it fails it will result in module
> > > loading failure eventually.  So while I guess it would be slightly better to
> > > make module loading fail if "enabling virtualization fails" in TDX, it is a nit
> > > issue to me.
> > > 
> > > I think "enabling virtualization failure" is the only "unexpected issue" that
> > > should result in module loading failure.  For any other TDX-specific
> > > initialization failure (e.g., any memory allocation in future patches) it's
> > > better to only disable TDX.
> > 
> > I disagree.  The platform owner wants TDX to be enabled, KVM shouldn't silently
> > disable TDX because of a transient, unrelated failure.
> > 
> > If TDX _can't_ be supported, e.g. because EPT or MMIO SPTE caching was explicitly
> > disable, then that's different.  And that's the general pattern throughout KVM.
> > If a requested feature isn't supported, then KVM continues on updates the module
> > param accordingly.  But if something outright fails during setup, KVM aborts the
> > entire sequence.
> > 
> > > So I can change to "make loading KVM-the-module fail if enabling virtualization
> > > fails in TDX", but I want to confirm this is what you want?
> > 
> > I would prefer the logic to be: reject loading kvm-intel.ko if an operation that
> > would normally succeed, fails.
> 
> I looked at the final tdx.c that in our development branch [*], and below is the
> list of the things that need to be done to init TDX (the code in
> __tdx_bringup()), and my thinking of whether to fail loading the module or just
> disable TDX:
> 
> 1) Early dependency check fails.  Those include: tdp_mmu_enabled,
> enable_mmio_caching, X86_FEATURE_MOVDIR64B check and check the presence of
> TSX_CTL uret MSR.
> 
> For those we can disable TDX only but continue to load module.
> 
> 2) Enable virtualization fails.
> 
> For this we fail to load module (as you suggested).
> 
> 3) Fail to register TDX cpuhp to do tdx_cpu_enable() and handle cpu hotplug.
> 
> For this we only disable TDX but continue to load module.  The reason is I think
> this is similar to enable a specific KVM feature but the hardware doesn't
> support it.  We can go further to check the return value of tdx_cpu_enable() to
> distinguish cases like "module not loaded" and "unexpected error", but I really
> don't want to go that far.

Hrm, tdx_cpu_enable() is a bit of a mess.  Ideally, there would be a separate
"probe" API so that KVM could detect if TDX is supported.  Though maybe it's the
TDX module itself is flawed, e.g. if TDH_SYS_INIT is literally the only way to
detect whether or not a module is loaded.

So, absent a way to clean up tdx_cpu_enable(), maybe disable the module param if
it returns -ENODEV, otherwise fail the module load?

> 4) tdx_enable() fails.
> 
> Ditto to 3).

No, this should fail the module load.  E.g. most of the error conditions are
-ENOMEM, which has nothing to do with host support for TDX.

> 5) tdx_get_sysinfo() fails.
> 
> This is a kernel bug since tdx_get_sysinfo() should always return valid TDX
> sysinfo structure pointer after tdx_enable() is done successfully.  Currently we
> just WARN() if the returned pointer is NULL and disable TDX only.  I think it's
> also fine.
> 
> 6) TDX global metadata check fails, e.g., MAX_VCPUS etc.
> 
> Ditto to 3).  For this we disable TDX only.

Where is this code?
Huang, Kai Nov. 6, 2024, 8:06 p.m. UTC | #12
On Wed, 2024-11-06 at 07:01 -0800, Sean Christopherson wrote:
> On Wed, Nov 06, 2024, Kai Huang wrote:
> > On Thu, 2024-10-31 at 13:22 -0700, Sean Christopherson wrote:
> > > On Thu, Oct 31, 2024, Kai Huang wrote:
> > > > On Wed, 2024-10-30 at 08:19 -0700, Sean Christopherson wrote:
> > > > > > +void __init tdx_bringup(void)
> > > > > > +{
> > > > > > +	enable_tdx = enable_tdx && !__tdx_bringup();
> > > > > 
> > > > > Ah.  I don't love this approach because it mixes "failure" due to an unsupported
> > > > > configuration, with failure due to unexpected issues.  E.g. if enabling virtualization
> > > > > fails, loading KVM-the-module absolutely should fail too, not simply disable TDX.
> > > > 
> > > > Thanks for the comments.
> > > > 
> > > > I see your point.  However for "enabling virtualization failure" kvm_init() will
> > > > also try to do (default behaviour), so if it fails it will result in module
> > > > loading failure eventually.  So while I guess it would be slightly better to
> > > > make module loading fail if "enabling virtualization fails" in TDX, it is a nit
> > > > issue to me.
> > > > 
> > > > I think "enabling virtualization failure" is the only "unexpected issue" that
> > > > should result in module loading failure.  For any other TDX-specific
> > > > initialization failure (e.g., any memory allocation in future patches) it's
> > > > better to only disable TDX.
> > > 
> > > I disagree.  The platform owner wants TDX to be enabled, KVM shouldn't silently
> > > disable TDX because of a transient, unrelated failure.
> > > 
> > > If TDX _can't_ be supported, e.g. because EPT or MMIO SPTE caching was explicitly
> > > disable, then that's different.  And that's the general pattern throughout KVM.
> > > If a requested feature isn't supported, then KVM continues on updates the module
> > > param accordingly.  But if something outright fails during setup, KVM aborts the
> > > entire sequence.
> > > 
> > > > So I can change to "make loading KVM-the-module fail if enabling virtualization
> > > > fails in TDX", but I want to confirm this is what you want?
> > > 
> > > I would prefer the logic to be: reject loading kvm-intel.ko if an operation that
> > > would normally succeed, fails.
> > 
> > I looked at the final tdx.c that in our development branch [*], and below is the
> > list of the things that need to be done to init TDX (the code in
> > __tdx_bringup()), and my thinking of whether to fail loading the module or just
> > disable TDX:
> > 
> > 1) Early dependency check fails.  Those include: tdp_mmu_enabled,
> > enable_mmio_caching, X86_FEATURE_MOVDIR64B check and check the presence of
> > TSX_CTL uret MSR.
> > 
> > For those we can disable TDX only but continue to load module.
> > 
> > 2) Enable virtualization fails.
> > 
> > For this we fail to load module (as you suggested).
> > 
> > 3) Fail to register TDX cpuhp to do tdx_cpu_enable() and handle cpu hotplug.
> > 
> > For this we only disable TDX but continue to load module.  The reason is I think
> > this is similar to enable a specific KVM feature but the hardware doesn't
> > support it.  We can go further to check the return value of tdx_cpu_enable() to
> > distinguish cases like "module not loaded" and "unexpected error", but I really
> > don't want to go that far.
> 
> Hrm, tdx_cpu_enable() is a bit of a mess.  Ideally, there would be a separate
> "probe" API so that KVM could detect if TDX is supported.  Though maybe it's the
> TDX module itself is flawed, e.g. if TDH_SYS_INIT is literally the only way to
> detect whether or not a module is loaded.

We can also use P-SEAMLDR SEAMCALL to query, but I see no difference between
using TDH_SYS_INIT.  If you are asking whether there's CPUID or MSR to query
then no.

> 
> So, absent a way to clean up tdx_cpu_enable(), maybe disable the module param if
> it returns -ENODEV, otherwise fail the module load?

We can, but we need to assume cpuhp_setup_state_cpuslocked() itself will not
return -ENODEV (it is true now), otherwise we won't be able to distinguish
whether the -ENODEV was from cpuhp_setup_state_cpuslocked() or tdx_cpu_enable().

Unless we choose to do tdx_cpu_enable() via on_each_cpu() separately.

Btw tdx_cpu_enable() itself will print "module not loaded" in case of -ENODEV,
so the user will be aware anyway if we only disable TDX but not fail module
loading.

My concern is still the whole "different handling of error cases" seems over-
engineering.

> 
> > 4) tdx_enable() fails.
> > 
> > Ditto to 3).
> 
> No, this should fail the module load.  E.g. most of the error conditions are
> -ENOMEM, which has nothing to do with host support for TDX.
> 
> > 5) tdx_get_sysinfo() fails.
> > 
> > This is a kernel bug since tdx_get_sysinfo() should always return valid TDX
> > sysinfo structure pointer after tdx_enable() is done successfully.  Currently we
> > just WARN() if the returned pointer is NULL and disable TDX only.  I think it's
> > also fine.
> > 
> > 6) TDX global metadata check fails, e.g., MAX_VCPUS etc.
> > 
> > Ditto to 3).  For this we disable TDX only.
> 
> Where is this code?

Please check:

https://github.com/intel/tdx/blob/tdx_kvm_dev-2024-10-25.1-host-metadata-v6-rebase/arch/x86/kvm/vmx/tdx.c

.. starting at line 3320.

If you want individual commits, here's the list:


KVM: TDX: Get TDX global Information
https://github.com/intel/tdx/commit/6ae3ab1ddb51a4cf0f0810853a24d47d360abaea

KVM: TDX: Get system-wide info about TDX module on initialization
https://github.com/intel/tdx/commit/fd7947118b76f6d4256bc4228e03e73262e67ba2

KVM: TDX: Support per-VM KVM_CAP_MAX_VCPUS extension check
https://github.com/intel/tdx/commit/98162cf99ee728b97a0c9647bd2b39a254da6a4a
Sean Christopherson Nov. 7, 2024, 10:04 p.m. UTC | #13
On Wed, Nov 06, 2024, Kai Huang wrote:
> On Wed, 2024-11-06 at 07:01 -0800, Sean Christopherson wrote:
> > On Wed, Nov 06, 2024, Kai Huang wrote:
> > > For this we only disable TDX but continue to load module.  The reason is I think
> > > this is similar to enable a specific KVM feature but the hardware doesn't
> > > support it.  We can go further to check the return value of tdx_cpu_enable() to
> > > distinguish cases like "module not loaded" and "unexpected error", but I really
> > > don't want to go that far.
> > 
> > Hrm, tdx_cpu_enable() is a bit of a mess.  Ideally, there would be a separate
> > "probe" API so that KVM could detect if TDX is supported.  Though maybe it's the
> > TDX module itself is flawed, e.g. if TDH_SYS_INIT is literally the only way to
> > detect whether or not a module is loaded.
> 
> We can also use P-SEAMLDR SEAMCALL to query, but I see no difference between
> using TDH_SYS_INIT.  If you are asking whether there's CPUID or MSR to query
> then no.

Doesn't have to be a CPUID or MSR, anything idempotent would work.  Which begs
the question, is that P-SEAMLDR SEAMCALL query you have in mind idempotent? :-)

> > So, absent a way to clean up tdx_cpu_enable(), maybe disable the module param if
> > it returns -ENODEV, otherwise fail the module load?
> 
> We can, but we need to assume cpuhp_setup_state_cpuslocked() itself will not
> return -ENODEV (it is true now), otherwise we won't be able to distinguish
> whether the -ENODEV was from cpuhp_setup_state_cpuslocked() or tdx_cpu_enable().
> 
> Unless we choose to do tdx_cpu_enable() via on_each_cpu() separately.
> 
> Btw tdx_cpu_enable() itself will print "module not loaded" in case of -ENODEV,
> so the user will be aware anyway if we only disable TDX but not fail module
> loading.

That only helps if a human looks at dmesg before attempting to run a TDX VM on
the host, and parsing dmesg to treat that particular scenario as fatal isn't
something I want to recommend to end users.  E.g. if our platform configuration
screwed up and failed to load a TDX module, then I want that to be surfaced as
an alarm of sorts, not a silent "this platform doesn't support TDX" flag.

> My concern is still the whole "different handling of error cases" seems over-
> engineering.

IMO, that's a symptom of the TDX enabling code not cleanly separating "probe"
from "enable", and at a glance, that seems very solvable.  And I suspect that
cleaning things up will allow for additional hardening.  E.g. I assume the lack
of MOVDIR64B should be a WARN, but because KVM checks for MOVDIR64B before
checking for basic TDX support, it's an non-commitalpr_warn().

> > > 4) tdx_enable() fails.
> > > 
> > > Ditto to 3).
> > 
> > No, this should fail the module load.  E.g. most of the error conditions are
> > -ENOMEM, which has nothing to do with host support for TDX.
> > 
> > > 5) tdx_get_sysinfo() fails.
> > > 
> > > This is a kernel bug since tdx_get_sysinfo() should always return valid TDX
> > > sysinfo structure pointer after tdx_enable() is done successfully.  Currently we
> > > just WARN() if the returned pointer is NULL and disable TDX only.  I think it's
> > > also fine.
> > > 
> > > 6) TDX global metadata check fails, e.g., MAX_VCPUS etc.
> > > 
> > > Ditto to 3).  For this we disable TDX only.
> > 
> > Where is this code?
> 
> Please check:
> 
> https://github.com/intel/tdx/blob/tdx_kvm_dev-2024-10-25.1-host-metadata-v6-rebase/arch/x86/kvm/vmx/tdx.c
> 
> .. starting at line 3320.

Before I forget, that code has a bug.  This

	/* Check TDX module and KVM capabilities */
	if (!tdx_get_supported_attrs(&tdx_sysinfo->td_conf) ||
	    !tdx_get_supported_xfam(&tdx_sysinfo->td_conf))
		goto get_sysinfo_err;

will return '0' on error, instead of -EINVAL (or whatever it intends).

Back to the main discussion, these checks are obvious "probe" failures and so
should disable TDX without failing module load:

	if (!tdp_mmu_enabled || !enable_mmio_caching)
		return -EOPNOTSUPP;

	if (!cpu_feature_enabled(X86_FEATURE_MOVDIR64B)) {
		pr_warn("MOVDIR64B is reqiured for TDX\n");
		return -EOPNOTSUPP;
	}

A kvm_find_user_return_msr() error is obviously a KVM bug, i.e. should definitely
WARN and fail module module.  Ditto for kvm_enable_virtualization().

The boot_cpu_has(X86_FEATURE_TDX_HOST_PLATFORM) that's buried in tdx_enable()
really belongs in KVM.  Having it in both is totally fine, but KVM shouldn't do
a bunch of work and _then_ check if all that work was pointless.

I am ok treating everything at or after tdx_get_sysinfo() as fatal to module load,
especially since, IIUC, TD_SYS_INIT can't be undone, i.e. KVM has crossed a point
of no return.

In short, assuming KVM can query if a TDX module is a loaded, I don't think it's
all that much work to do:

  static bool kvm_is_tdx_supported(void)
  {
	if (boot_cpu_has(X86_FEATURE_TDX_HOST_PLATFORM))
		return false;

	if (!<is TDX module loaded>)
		return false;

	if (!tdp_mmu_enabled || !enable_mmio_caching)
		return false;

	if (WARN_ON_ONCE(!cpu_feature_enabled(X86_FEATURE_MOVDIR64B)))
		return false;

	return true;
  }

  int __init tdx_bringup(void)
  {
	enable_tdx = enable_tdx && kvm_is_tdx_supported();
	if (!enable_tdx)
		return 0;

	return __tdx_bringup();
  }
Huang, Kai Nov. 7, 2024, 11:25 p.m. UTC | #14
On 8/11/2024 11:04 am, Sean Christopherson wrote:
> On Wed, Nov 06, 2024, Kai Huang wrote:
>> On Wed, 2024-11-06 at 07:01 -0800, Sean Christopherson wrote:
>>> On Wed, Nov 06, 2024, Kai Huang wrote:
>>>> For this we only disable TDX but continue to load module.  The reason is I think
>>>> this is similar to enable a specific KVM feature but the hardware doesn't
>>>> support it.  We can go further to check the return value of tdx_cpu_enable() to
>>>> distinguish cases like "module not loaded" and "unexpected error", but I really
>>>> don't want to go that far.
>>>
>>> Hrm, tdx_cpu_enable() is a bit of a mess.  Ideally, there would be a separate
>>> "probe" API so that KVM could detect if TDX is supported.  Though maybe it's the
>>> TDX module itself is flawed, e.g. if TDH_SYS_INIT is literally the only way to
>>> detect whether or not a module is loaded.
>>
>> We can also use P-SEAMLDR SEAMCALL to query, but I see no difference between
>> using TDH_SYS_INIT.  If you are asking whether there's CPUID or MSR to query
>> then no.
> 
> Doesn't have to be a CPUID or MSR, anything idempotent would work.  Which begs
> the question, is that P-SEAMLDR SEAMCALL query you have in mind idempotent? :-)

It is the SEAMLDR.INFO SEAMCALL, which writes bunch of information to a 
SEAMLDR_INFO structure.  There's one bit "SEAM_READY" which (when true) 
indicates the TDX module is ready for SEAMCALL, i.e., the module is loaded.

And yes it is idempotent I believe, even we consider TDX module runtime 
reload/update.

The problem is it is a SEAMCALL, and being a SEAMCALL requires all 
things like enabling virtualization first and adding another wrapper API 
and structure definition to do SEAMLDR.INFO (and we are still in 
discussion how to export SEAMCALLs to let KVM make).

 From this perspective, I don't see a big difference between using 
SEAMLDR.INFO and tdx_cpu_enable() for probing TDX.

I agree we can change to use SEAMLDR.INFO to detect in the long term 
after we move VMXON out of KVM, though, because we can get a lot more 
information with that besides whether module is loaded.  But before 
that, I see no big difference.

> 
>>> So, absent a way to clean up tdx_cpu_enable(), maybe disable the module param if
>>> it returns -ENODEV, otherwise fail the module load?
>>
>> We can, but we need to assume cpuhp_setup_state_cpuslocked() itself will not
>> return -ENODEV (it is true now), otherwise we won't be able to distinguish
>> whether the -ENODEV was from cpuhp_setup_state_cpuslocked() or tdx_cpu_enable().
>>
>> Unless we choose to do tdx_cpu_enable() via on_each_cpu() separately.
>>
>> Btw tdx_cpu_enable() itself will print "module not loaded" in case of -ENODEV,
>> so the user will be aware anyway if we only disable TDX but not fail module
>> loading.
> 
> That only helps if a human looks at dmesg before attempting to run a TDX VM on
> the host, and parsing dmesg to treat that particular scenario as fatal isn't
> something I want to recommend to end users.  E.g. if our platform configuration
> screwed up and failed to load a TDX module, then I want that to be surfaced as
> an alarm of sorts, not a silent "this platform doesn't support TDX" flag.
> 
>> My concern is still the whole "different handling of error cases" seems over-
>> engineering.
> 
> IMO, that's a symptom of the TDX enabling code not cleanly separating "probe"
> from "enable", and at a glance, that seems very solvable.  

I am not so sure about this at this stage, because you need to make a 
SEAMCALL anyway for that. :-)

I think we can document this imperfection for now and enhance after 
moving VMXON out of KVM.

Btw we are going to add P-SEAMLDR SEAMCALLs to support TDX runtime 
reload/update anyway, so we can add SEAMLDR.INFO there (or before that...).

> And I suspect that
> cleaning things up will allow for additional hardening.  E.g. I assume the lack
> of MOVDIR64B should be a WARN, but because KVM checks for MOVDIR64B before
> checking for basic TDX support, it's an non-commitalpr_warn().

Yeah if we check TDX_HOST_PLATFORM first then we can WARN() on MOVDIR64B.

> 
>>>> 4) tdx_enable() fails.
>>>>
>>>> Ditto to 3).
>>>
>>> No, this should fail the module load.  E.g. most of the error conditions are
>>> -ENOMEM, which has nothing to do with host support for TDX.
>>>
>>>> 5) tdx_get_sysinfo() fails.
>>>>
>>>> This is a kernel bug since tdx_get_sysinfo() should always return valid TDX
>>>> sysinfo structure pointer after tdx_enable() is done successfully.  Currently we
>>>> just WARN() if the returned pointer is NULL and disable TDX only.  I think it's
>>>> also fine.
>>>>
>>>> 6) TDX global metadata check fails, e.g., MAX_VCPUS etc.
>>>>
>>>> Ditto to 3).  For this we disable TDX only.
>>>
>>> Where is this code?
>>
>> Please check:
>>
>> https://github.com/intel/tdx/blob/tdx_kvm_dev-2024-10-25.1-host-metadata-v6-rebase/arch/x86/kvm/vmx/tdx.c
>>
>> .. starting at line 3320.
> 
> Before I forget, that code has a bug.  This
> 
> 	/* Check TDX module and KVM capabilities */
> 	if (!tdx_get_supported_attrs(&tdx_sysinfo->td_conf) ||
> 	    !tdx_get_supported_xfam(&tdx_sysinfo->td_conf))
> 		goto get_sysinfo_err;
> 
> will return '0' on error, instead of -EINVAL (or whatever it intends).

Indeed.  Thanks for catching this.  I'll report to Xiaoyao.

> 
> Back to the main discussion, these checks are obvious "probe" failures and so
> should disable TDX without failing module load:
> 
> 	if (!tdp_mmu_enabled || !enable_mmio_caching)
> 		return -EOPNOTSUPP;
> 
> 	if (!cpu_feature_enabled(X86_FEATURE_MOVDIR64B)) {
> 		pr_warn("MOVDIR64B is reqiured for TDX\n");
> 		return -EOPNOTSUPP;
> 	}

Yeah sure.

> 
> A kvm_find_user_return_msr() error is obviously a KVM bug, i.e. should definitely
> WARN and fail module module.  Ditto for kvm_enable_virtualization().

OK agreed.

> 
> The boot_cpu_has(X86_FEATURE_TDX_HOST_PLATFORM) that's buried in tdx_enable()
> really belongs in KVM.  Having it in both is totally fine, but KVM shouldn't do
> a bunch of work and _then_ check if all that work was pointless.

Fine to me.

> 
> I am ok treating everything at or after tdx_get_sysinfo() as fatal to module load,
> especially since, IIUC, TD_SYS_INIT can't be undone, i.e. KVM has crossed a point
> of no return.
> 
> In short, assuming KVM can query if a TDX module is a loaded, I don't think it's
> all that much work to do:
> 
>    static bool kvm_is_tdx_supported(void)
>    {
> 	if (boot_cpu_has(X86_FEATURE_TDX_HOST_PLATFORM))
> 		return false;
> 
> 	if (!<is TDX module loaded>)
> 		return false;
> 
> 	if (!tdp_mmu_enabled || !enable_mmio_caching)
> 		return false;
> 
> 	if (WARN_ON_ONCE(!cpu_feature_enabled(X86_FEATURE_MOVDIR64B)))
> 		return false;
> 
> 	return true;
>    }
> 
>    int __init tdx_bringup(void)
>    {
> 	enable_tdx = enable_tdx && kvm_is_tdx_supported();
> 	if (!enable_tdx)
> 		return 0;
> 
> 	return __tdx_bringup();
>    }

Thanks for clarifying this.

As mentioned above, I would say we just use tdx_cpu_enable() to "probe" 
whether module is present before moving VMXON out of KVM.  We can 
document this imperfection for now and revisit later.

How about:

static bool kvm_can_support_tdx(void)
{
	if (boot_cpu_has(X86_FEATURE_TDX_HOST_PLATFORM))
		return false;

	if (!tdp_mmu_enabled || !enable_mmio_caching)
  		return false;

	if (WARN_ON_ONCE(!cpu_feature_enabled(X86_FEATURE_MOVDIR64B)))
		return false;
}

int __init tdx_bringup(void)
{
	int r;

	enable_tdx = enable_tdx && kvm_can_support_tdx();

	if (!enable_tdx)
		return 0;

	/*
	 * Ideally KVM should probe whether TDX module has been loaded
	 * first and then try to bring it up, because KVM should treat
	 * them differently.  I.e., KVM should just disable TDX while
	 * still allow module to be loaded when TDX module is not
	 * loaded, but fail to load module at all when fail to bring up
	 * TDX.
	 *
	 * But unfortunately TDX needs to use SEAMCALL to probe whether
	 * the module is loaded (there is not CPUID or MSR for that),
	 * and making SEAMCALL requires enabling virtualization first (
	 * like the rest steps of bringing up TDX module).
	 *
	 * For simplicity, do the probing and bringing up together for
	 * now.
	 *
	 * Note the first SEAMCALL to bringing up TDX will return
	 * -ENODEV when module is not loaded, and this serves the probe
	 * albeit it is not perfect.
	 *
	 * Another option is using P-SEAMLDR's SEAMLDR.INFO SEAMCALL to
	 * probe, but it is still a SEAMCALL.  Currently kernel doesn't
	 * support P-SEAMLDR SEAMCALLs so don't bother to add it just
	 * for probing TDX module.
	 *
	 * Again, this is not perfect, and can be revisited once VMXON
	 * is moved to the core-kernel.
	 */
	r = __tdx_probe_and_bringup();
	if (r) {
		enable_tdx = 0;
		/*
		 * Disable TDX only but don't fail to load module when
		 * TDX module is not loaded.  No need to print error
		 * message since __tdx_probe_and_bringup() already did
		 * that in this case.
		 */
		if (r == -ENODEV)
			r = 0;
	}
	
	return r;
}
diff mbox series

Patch

diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
index f9dddb8cb466..fec803aff7ad 100644
--- a/arch/x86/kvm/Makefile
+++ b/arch/x86/kvm/Makefile
@@ -20,6 +20,7 @@  kvm-intel-y		+= vmx/vmx.o vmx/vmenter.o vmx/pmu_intel.o vmx/vmcs12.o \
 
 kvm-intel-$(CONFIG_X86_SGX_KVM)	+= vmx/sgx.o
 kvm-intel-$(CONFIG_KVM_HYPERV)	+= vmx/hyperv.o vmx/hyperv_evmcs.o
+kvm-intel-$(CONFIG_INTEL_TDX_HOST)	+= vmx/tdx.o
 
 kvm-amd-y		+= svm/svm.o svm/vmenter.o svm/pmu.o svm/nested.o svm/avic.o
 
diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
index 433ecbd90905..053294939eb1 100644
--- a/arch/x86/kvm/vmx/main.c
+++ b/arch/x86/kvm/vmx/main.c
@@ -6,6 +6,7 @@ 
 #include "nested.h"
 #include "pmu.h"
 #include "posted_intr.h"
+#include "tdx.h"
 
 #define VMX_REQUIRED_APICV_INHIBITS				\
 	(BIT(APICV_INHIBIT_REASON_DISABLED) |			\
@@ -170,6 +171,7 @@  struct kvm_x86_init_ops vt_init_ops __initdata = {
 static void vt_exit(void)
 {
 	kvm_exit();
+	tdx_cleanup();
 	vmx_exit();
 }
 module_exit(vt_exit);
@@ -182,6 +184,9 @@  static int __init vt_init(void)
 	if (r)
 		return r;
 
+	/* tdx_init() has been taken */
+	tdx_bringup();
+
 	/*
 	 * Common KVM initialization _must_ come last, after this, /dev/kvm is
 	 * exposed to userspace!
@@ -194,6 +199,7 @@  static int __init vt_init(void)
 	return 0;
 
 err_kvm_init:
+	tdx_cleanup();
 	vmx_exit();
 	return r;
 }
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
new file mode 100644
index 000000000000..8651599822d5
--- /dev/null
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -0,0 +1,115 @@ 
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/cpu.h>
+#include <asm/tdx.h>
+#include "capabilities.h"
+#include "tdx.h"
+
+#undef pr_fmt
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+static bool enable_tdx __ro_after_init;
+module_param_named(tdx, enable_tdx, bool, 0444);
+
+static enum cpuhp_state tdx_cpuhp_state;
+
+static int tdx_online_cpu(unsigned int cpu)
+{
+	unsigned long flags;
+	int r;
+
+	/* Sanity check CPU is already in post-VMXON */
+	WARN_ON_ONCE(!(cr4_read_shadow() & X86_CR4_VMXE));
+
+	/* tdx_cpu_enable() must be called with IRQ disabled */
+	local_irq_save(flags);
+	r = tdx_cpu_enable();
+	local_irq_restore(flags);
+
+	return r;
+}
+
+static void __do_tdx_cleanup(void)
+{
+	/*
+	 * Once TDX module is initialized, it cannot be disabled and
+	 * re-initialized again w/o runtime update (which isn't
+	 * supported by kernel).  In fact the kernel doesn't support
+	 * disable (shut down) TDX module, so only need to remove the
+	 * cpuhp state.
+	 */
+	WARN_ON_ONCE(!tdx_cpuhp_state);
+	cpuhp_remove_state_nocalls(tdx_cpuhp_state);
+	tdx_cpuhp_state = 0;
+}
+
+static int __init __do_tdx_bringup(void)
+{
+	int r;
+
+	/*
+	 * TDX-specific cpuhp callback to call tdx_cpu_enable() on all
+	 * online CPUs before calling tdx_enable(), and on any new
+	 * going-online CPU to make sure it is ready for TDX guest.
+	 */
+	r = cpuhp_setup_state_cpuslocked(CPUHP_AP_ONLINE_DYN,
+					 "kvm/cpu/tdx:online",
+					 tdx_online_cpu, NULL);
+	if (r < 0)
+		return r;
+
+	tdx_cpuhp_state = r;
+
+	/* tdx_enable() must be called with cpus_read_lock() */
+	r = tdx_enable();
+	if (r)
+		__do_tdx_cleanup();
+
+	return r;
+}
+
+static int __init __tdx_bringup(void)
+{
+	int r;
+
+	if (!enable_ept) {
+		pr_err("Cannot enable TDX with EPT disabled.\n");
+		return -EINVAL;
+	}
+
+	/*
+	 * Enabling TDX requires enabling hardware virtualization first,
+	 * as making SEAMCALLs requires CPU being in post-VMXON state.
+	 */
+	r = kvm_enable_virtualization();
+	if (r)
+		return r;
+
+	cpus_read_lock();
+	r = __do_tdx_bringup();
+	cpus_read_unlock();
+
+	if (r)
+		goto tdx_bringup_err;
+
+	/*
+	 * Leave hardware virtualization enabled after TDX is enabled
+	 * successfully.  TDX CPU hotplug depends on this.
+	 */
+	return 0;
+tdx_bringup_err:
+	kvm_disable_virtualization();
+	return r;
+}
+
+void tdx_cleanup(void)
+{
+	if (enable_tdx) {
+		__do_tdx_cleanup();
+		kvm_disable_virtualization();
+	}
+}
+
+void __init tdx_bringup(void)
+{
+	enable_tdx = enable_tdx && !__tdx_bringup();
+}
diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h
new file mode 100644
index 000000000000..766a6121f670
--- /dev/null
+++ b/arch/x86/kvm/vmx/tdx.h
@@ -0,0 +1,12 @@ 
+#ifndef  __KVM_X86_VMX_TDX_H
+#define __KVM_X86_VMX_TDX_H
+
+#ifdef CONFIG_INTEL_TDX_HOST
+void tdx_bringup(void);
+void tdx_cleanup(void);
+#else
+static inline void tdx_bringup(void) {}
+static inline void tdx_cleanup(void) {}
+#endif
+
+#endif