Message ID | f150316b975b5ca22c6c4016ffd90db79d657bbf.1678111292.git.kai.huang@intel.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | TDX host kernel support | expand |
On Tue, Mar 07, 2023 at 03:13:50AM +1300, Kai Huang <kai.huang@intel.com> wrote: > To enable TDX the kernel needs to initialize TDX from two perspectives: > 1) Do a set of SEAMCALLs to initialize the TDX module to make it ready > to create and run TDX guests; 2) Do the per-cpu initialization SEAMCALL > on one logical cpu before the kernel wants to make any other SEAMCALLs > on that cpu (including those involved during module initialization and > running TDX guests). > > The TDX module can be initialized only once in its lifetime. Instead > of always initializing it at boot time, this implementation chooses an > "on demand" approach to initialize TDX until there is a real need (e.g > when requested by KVM). This approach has below pros: > > 1) It avoids consuming the memory that must be allocated by kernel and > given to the TDX module as metadata (~1/256th of the TDX-usable memory), > and also saves the CPU cycles of initializing the TDX module (and the > metadata) when TDX is not used at all. > > 2) The TDX module design allows it to be updated while the system is > running. The update procedure shares quite a few steps with this "on > demand" initialization mechanism. The hope is that much of "on demand" > mechanism can be shared with a future "update" mechanism. A boot-time > TDX module implementation would not be able to share much code with the > update mechanism. > > 3) Making SEAMCALL requires VMX to be enabled. Currently, only the KVM > code mucks with VMX enabling. If the TDX module were to be initialized > separately from KVM (like at boot), the boot code would need to be > taught how to muck with VMX enabling and KVM would need to be taught how > to cope with that. Making KVM itself responsible for TDX initialization > lets the rest of the kernel stay blissfully unaware of VMX. > > Similar to module initialization, also make the per-cpu initialization > "on demand" as it also depends on VMX to be enabled. > > Add two functions, tdx_enable() and tdx_cpu_enable(), to enable the TDX > module and enable TDX on local cpu respectively. For now tdx_enable() > is a placeholder. The TODO list will be pared down as functionality is > added. > > In tdx_enable() use a state machine protected by mutex to make sure the > initialization will only be done once, as tdx_enable() can be called > multiple times (i.e. KVM module can be reloaded) and may be called > concurrently by other kernel components in the future. > > The per-cpu initialization on each cpu can only be done once during the > module's life time. Use a per-cpu variable to track its status to make > sure it is only done once in tdx_cpu_enable(). > > Also, a SEAMCALL to do TDX module global initialization must be done > once on any logical cpu before any per-cpu initialization SEAMCALL. Do > it inside tdx_cpu_enable() too (if hasn't been done). > > tdx_enable() can potentially invoke SEAMCALLs on any online cpus. The > per-cpu initialization must be done before those SEAMCALLs are invoked > on some cpu. To keep things simple, in tdx_cpu_enable(), always do the > per-cpu initialization regardless of whether the TDX module has been > initialized or not. And in tdx_enable(), don't call tdx_cpu_enable() > but assume the caller has disabled CPU hotplug and done VMXON and > tdx_cpu_enable() on all online cpus before calling tdx_enable(). > > Signed-off-by: Kai Huang <kai.huang@intel.com> > --- > > v9 -> v10: > - Merged the patch to handle per-cpu initialization to this patch to > tell the story better. > - Changed how to handle the per-cpu initialization to only provide a > tdx_cpu_enable() function to let the user of TDX to do it when the > user wants to run TDX code on a certain cpu. > - Changed tdx_enable() to not call cpus_read_lock() explicitly, but > call lockdep_assert_cpus_held() to assume the caller has done that. > - Improved comments around tdx_enable() and tdx_cpu_enable(). > - Improved changelog to tell the story better accordingly. > > v8 -> v9: > - Removed detailed TODO list in the changelog (Dave). > - Added back steps to do module global initialization and per-cpu > initialization in the TODO list comment. > - Moved the 'enum tdx_module_status_t' from tdx.c to local tdx.h > > v7 -> v8: > - Refined changelog (Dave). > - Removed "all BIOS-enabled cpus" related code (Peter/Thomas/Dave). > - Add a "TODO list" comment in init_tdx_module() to list all steps of > initializing the TDX Module to tell the story (Dave). > - Made tdx_enable() unverisally return -EINVAL, and removed nonsense > comments (Dave). > - Simplified __tdx_enable() to only handle success or failure. > - TDX_MODULE_SHUTDOWN -> TDX_MODULE_ERROR > - Removed TDX_MODULE_NONE (not loaded) as it is not necessary. > - Improved comments (Dave). > - Pointed out 'tdx_module_status' is software thing (Dave). > > v6 -> v7: > - No change. > > v5 -> v6: > - Added code to set status to TDX_MODULE_NONE if TDX module is not > loaded (Chao) > - Added Chao's Reviewed-by. > - Improved comments around cpus_read_lock(). > > - v3->v5 (no feedback on v4): > - Removed the check that SEAMRR and TDX KeyID have been detected on > all present cpus. > - Removed tdx_detect(). > - Added num_online_cpus() to MADT-enabled CPUs check within the CPU > hotplug lock and return early with error message. > - Improved dmesg printing for TDX module detection and initialization. > > --- > arch/x86/include/asm/tdx.h | 4 + > arch/x86/virt/vmx/tdx/tdx.c | 182 ++++++++++++++++++++++++++++++++++++ > arch/x86/virt/vmx/tdx/tdx.h | 25 +++++ > 3 files changed, 211 insertions(+) > > diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h > index b489b5b9de5d..112a5b9bd5cd 100644 > --- a/arch/x86/include/asm/tdx.h > +++ b/arch/x86/include/asm/tdx.h > @@ -102,8 +102,12 @@ static inline long tdx_kvm_hypercall(unsigned int nr, unsigned long p1, > > #ifdef CONFIG_INTEL_TDX_HOST > bool platform_tdx_enabled(void); > +int tdx_cpu_enable(void); > +int tdx_enable(void); > #else /* !CONFIG_INTEL_TDX_HOST */ > static inline bool platform_tdx_enabled(void) { return false; } > +static inline int tdx_cpu_enable(void) { return -EINVAL; } > +static inline int tdx_enable(void) { return -EINVAL; } > #endif /* CONFIG_INTEL_TDX_HOST */ > > #endif /* !__ASSEMBLY__ */ > diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c > index b65b838f3b5d..29127cb70f51 100644 > --- a/arch/x86/virt/vmx/tdx/tdx.c > +++ b/arch/x86/virt/vmx/tdx/tdx.c > @@ -13,6 +13,10 @@ > #include <linux/errno.h> > #include <linux/printk.h> > #include <linux/smp.h> > +#include <linux/cpu.h> > +#include <linux/spinlock.h> > +#include <linux/percpu-defs.h> > +#include <linux/mutex.h> > #include <asm/msr-index.h> > #include <asm/msr.h> > #include <asm/tdx.h> > @@ -22,6 +26,18 @@ static u32 tdx_global_keyid __ro_after_init; > static u32 tdx_guest_keyid_start __ro_after_init; > static u32 tdx_nr_guest_keyids __ro_after_init; > > +static unsigned int tdx_global_init_status; > +static DEFINE_SPINLOCK(tdx_global_init_lock); > +#define TDX_GLOBAL_INIT_DONE _BITUL(0) > +#define TDX_GLOBAL_INIT_FAILED _BITUL(1) > + > +static DEFINE_PER_CPU(unsigned int, tdx_lp_init_status); > +#define TDX_LP_INIT_DONE _BITUL(0) > +#define TDX_LP_INIT_FAILED _BITUL(1) > + > +static enum tdx_module_status_t tdx_module_status; > +static DEFINE_MUTEX(tdx_module_lock); > + > /* > * Use tdx_global_keyid to indicate that TDX is uninitialized. > * This is used in TDX initialization error paths to take it from > @@ -159,3 +175,169 @@ static int __always_unused seamcall(u64 fn, u64 rcx, u64 rdx, u64 r8, u64 r9, > put_cpu(); > return ret; > } > + > +static int try_init_module_global(void) > +{ > + int ret; > + > + /* > + * The TDX module global initialization only needs to be done > + * once on any cpu. > + */ > + spin_lock(&tdx_global_init_lock); > + > + if (tdx_global_init_status & TDX_GLOBAL_INIT_DONE) { > + ret = tdx_global_init_status & TDX_GLOBAL_INIT_FAILED ? > + -EINVAL : 0; > + goto out; > + } > + > + /* All '0's are just unused parameters. */ > + ret = seamcall(TDH_SYS_INIT, 0, 0, 0, 0, NULL, NULL); > + > + tdx_global_init_status = TDX_GLOBAL_INIT_DONE; > + if (ret) > + tdx_global_init_status |= TDX_GLOBAL_INIT_FAILED; If entropy is lacking (rdrand failure), TDH_SYS_INIT can return TDX_SYS_BUSY. In such case, we should allow the caller to retry or make this function retry instead of marking error stickily. Except that, Reviewed-by: Isaku Yamahata <isaku.yamahata@intel.com> Thanks,
On Wed, 2023-03-08 at 14:27 -0800, Isaku Yamahata wrote: > > + > > +static int try_init_module_global(void) > > +{ > > + int ret; > > + > > + /* > > + * The TDX module global initialization only needs to be done > > + * once on any cpu. > > + */ > > + spin_lock(&tdx_global_init_lock); > > + > > + if (tdx_global_init_status & TDX_GLOBAL_INIT_DONE) { > > + ret = tdx_global_init_status & TDX_GLOBAL_INIT_FAILED ? > > + -EINVAL : 0; > > + goto out; > > + } > > + > > + /* All '0's are just unused parameters. */ > > + ret = seamcall(TDH_SYS_INIT, 0, 0, 0, 0, NULL, NULL); > > + > > + tdx_global_init_status = TDX_GLOBAL_INIT_DONE; > > + if (ret) > > + tdx_global_init_status |= TDX_GLOBAL_INIT_FAILED; > > If entropy is lacking (rdrand failure), TDH_SYS_INIT can return TDX_SYS_BUSY. > In such case, we should allow the caller to retry or make this function retry > instead of marking error stickily. The spec says: TDX_SYS_BUSY The operation was invoked when another TDX module operation was in progress. The operation may be retried. So I don't see how entropy is lacking is related to this error. Perhaps you were mixing up with KEY.CONFIG?
On Sun, Mar 12, 2023 at 11:08:44PM +0000, "Huang, Kai" <kai.huang@intel.com> wrote: > On Wed, 2023-03-08 at 14:27 -0800, Isaku Yamahata wrote: > > > + > > > +static int try_init_module_global(void) > > > +{ > > > + int ret; > > > + > > > + /* > > > + * The TDX module global initialization only needs to be done > > > + * once on any cpu. > > > + */ > > > + spin_lock(&tdx_global_init_lock); > > > + > > > + if (tdx_global_init_status & TDX_GLOBAL_INIT_DONE) { > > > + ret = tdx_global_init_status & TDX_GLOBAL_INIT_FAILED ? > > > + -EINVAL : 0; > > > + goto out; > > > + } > > > + > > > + /* All '0's are just unused parameters. */ > > > + ret = seamcall(TDH_SYS_INIT, 0, 0, 0, 0, NULL, NULL); > > > + > > > + tdx_global_init_status = TDX_GLOBAL_INIT_DONE; > > > + if (ret) > > > + tdx_global_init_status |= TDX_GLOBAL_INIT_FAILED; > > > > If entropy is lacking (rdrand failure), TDH_SYS_INIT can return TDX_SYS_BUSY. > > In such case, we should allow the caller to retry or make this function retry > > instead of marking error stickily. > > The spec says: > > TDX_SYS_BUSY The operation was invoked when another TDX module > operation was in progress. The operation may be retried. > > So I don't see how entropy is lacking is related to this error. Perhaps you > were mixing up with KEY.CONFIG? TDH.SYS.INIT() initializes global canary value. TDX module is compiled with strong stack protector enabled by clang and canary value needs to be initialized. By default, the canary value is stored at %fsbase:<STACK_CANARY_OFFSET 0x28> Although this is a job for libc or language runtime, TDX modules has to do it itself because it's stand alone. From tdh_sys_init.c _STATIC_INLINE_ api_error_type tdx_init_stack_canary(void) { ia32_rflags_t rflags = {.raw = 0}; uint64_t canary; if (!ia32_rdrand(&rflags, &canary)) { return TDX_SYS_BUSY; } ... last_page_ptr->stack_canary.canary = canary;
On Mon, 2023-03-13 at 16:49 -0700, Isaku Yamahata wrote: > On Sun, Mar 12, 2023 at 11:08:44PM +0000, > "Huang, Kai" <kai.huang@intel.com> wrote: > > > On Wed, 2023-03-08 at 14:27 -0800, Isaku Yamahata wrote: > > > > + > > > > +static int try_init_module_global(void) > > > > +{ > > > > + int ret; > > > > + > > > > + /* > > > > + * The TDX module global initialization only needs to be done > > > > + * once on any cpu. > > > > + */ > > > > + spin_lock(&tdx_global_init_lock); > > > > + > > > > + if (tdx_global_init_status & TDX_GLOBAL_INIT_DONE) { > > > > + ret = tdx_global_init_status & TDX_GLOBAL_INIT_FAILED ? > > > > + -EINVAL : 0; > > > > + goto out; > > > > + } > > > > + > > > > + /* All '0's are just unused parameters. */ > > > > + ret = seamcall(TDH_SYS_INIT, 0, 0, 0, 0, NULL, NULL); > > > > + > > > > + tdx_global_init_status = TDX_GLOBAL_INIT_DONE; > > > > + if (ret) > > > > + tdx_global_init_status |= TDX_GLOBAL_INIT_FAILED; > > > > > > If entropy is lacking (rdrand failure), TDH_SYS_INIT can return TDX_SYS_BUSY. > > > In such case, we should allow the caller to retry or make this function retry > > > instead of marking error stickily. > > > > The spec says: > > > > TDX_SYS_BUSY The operation was invoked when another TDX module > > operation was in progress. The operation may be retried. > > > > So I don't see how entropy is lacking is related to this error. Perhaps you > > were mixing up with KEY.CONFIG? > > TDH.SYS.INIT() initializes global canary value. TDX module is compiled with > strong stack protector enabled by clang and canary value needs to be > initialized. By default, the canary value is stored at > %fsbase:<STACK_CANARY_OFFSET 0x28> > > Although this is a job for libc or language runtime, TDX modules has to do it > itself because it's stand alone. > > From tdh_sys_init.c > _STATIC_INLINE_ api_error_type tdx_init_stack_canary(void) > { > ia32_rflags_t rflags = {.raw = 0}; > uint64_t canary; > if (!ia32_rdrand(&rflags, &canary)) > { > return TDX_SYS_BUSY; > } > ... > last_page_ptr->stack_canary.canary = canary; > > Then it is a hidden behaviour of the TDX module that is not reflected in the spec. I am not sure whether we should handle because: 1) This is an extremely rare case. Kernel would be basically under attack if such error happened. In the current series we don't handle such case in KEY.CONFIG either but just leave a comment (see patch 13). 2) Not sure whether this will be changed in the future. So I think we should keep as is.
On Tue, Mar 14, 2023 at 01:50:40AM +0000, "Huang, Kai" <kai.huang@intel.com> wrote: > On Mon, 2023-03-13 at 16:49 -0700, Isaku Yamahata wrote: > > On Sun, Mar 12, 2023 at 11:08:44PM +0000, > > "Huang, Kai" <kai.huang@intel.com> wrote: > > > > > On Wed, 2023-03-08 at 14:27 -0800, Isaku Yamahata wrote: > > > > > + > > > > > +static int try_init_module_global(void) > > > > > +{ > > > > > + int ret; > > > > > + > > > > > + /* > > > > > + * The TDX module global initialization only needs to be done > > > > > + * once on any cpu. > > > > > + */ > > > > > + spin_lock(&tdx_global_init_lock); > > > > > + > > > > > + if (tdx_global_init_status & TDX_GLOBAL_INIT_DONE) { > > > > > + ret = tdx_global_init_status & TDX_GLOBAL_INIT_FAILED ? > > > > > + -EINVAL : 0; > > > > > + goto out; > > > > > + } > > > > > + > > > > > + /* All '0's are just unused parameters. */ > > > > > + ret = seamcall(TDH_SYS_INIT, 0, 0, 0, 0, NULL, NULL); > > > > > + > > > > > + tdx_global_init_status = TDX_GLOBAL_INIT_DONE; > > > > > + if (ret) > > > > > + tdx_global_init_status |= TDX_GLOBAL_INIT_FAILED; > > > > > > > > If entropy is lacking (rdrand failure), TDH_SYS_INIT can return TDX_SYS_BUSY. > > > > In such case, we should allow the caller to retry or make this function retry > > > > instead of marking error stickily. > > > > > > The spec says: > > > > > > TDX_SYS_BUSY The operation was invoked when another TDX module > > > operation was in progress. The operation may be retried. > > > > > > So I don't see how entropy is lacking is related to this error. Perhaps you > > > were mixing up with KEY.CONFIG? > > > > TDH.SYS.INIT() initializes global canary value. TDX module is compiled with > > strong stack protector enabled by clang and canary value needs to be > > initialized. By default, the canary value is stored at > > %fsbase:<STACK_CANARY_OFFSET 0x28> > > > > Although this is a job for libc or language runtime, TDX modules has to do it > > itself because it's stand alone. > > > > From tdh_sys_init.c > > _STATIC_INLINE_ api_error_type tdx_init_stack_canary(void) > > { > > ia32_rflags_t rflags = {.raw = 0}; > > uint64_t canary; > > if (!ia32_rdrand(&rflags, &canary)) > > { > > return TDX_SYS_BUSY; > > } > > ... > > last_page_ptr->stack_canary.canary = canary; > > > > > > Then it is a hidden behaviour of the TDX module that is not reflected in the > spec. I am not sure whether we should handle because: > > 1) This is an extremely rare case. Kernel would be basically under attack if > such error happened. In the current series we don't handle such case in > KEY.CONFIG either but just leave a comment (see patch 13). > > 2) Not sure whether this will be changed in the future. > > So I think we should keep as is. TDX 1.5 spec introduced TDX_RND_NO_ENTROPY status code. For TDX 1.0, let's postpone it to TDX 1.5 activity.
On 3/13/23 21:02, Isaku Yamahata wrote: >> Then it is a hidden behaviour of the TDX module that is not reflected in the >> spec. I am not sure whether we should handle because: >> >> 1) This is an extremely rare case. Kernel would be basically under attack if >> such error happened. In the current series we don't handle such case in >> KEY.CONFIG either but just leave a comment (see patch 13). >> >> 2) Not sure whether this will be changed in the future. >> >> So I think we should keep as is. > TDX 1.5 spec introduced TDX_RND_NO_ENTROPY status code. For TDX 1.0, let's > postpone it to TDX 1.5 activity. What the heck does this mean? I don't remember seeing any code here that checks for "TDX 1.0" or "TDX 1.5". That means that this code needs to work with _any_ TDX version. Are features being added to new versions that break code written for old versions?
On 3/13/23 18:50, Huang, Kai wrote: > On Mon, 2023-03-13 at 16:49 -0700, Isaku Yamahata wrote: >> On Sun, Mar 12, 2023 at 11:08:44PM +0000, >> "Huang, Kai" <kai.huang@intel.com> wrote: >> >>> On Wed, 2023-03-08 at 14:27 -0800, Isaku Yamahata wrote: >>>>> + >>>>> +static int try_init_module_global(void) >>>>> +{ >>>>> + int ret; >>>>> + >>>>> + /* >>>>> + * The TDX module global initialization only needs to be done >>>>> + * once on any cpu. >>>>> + */ >>>>> + spin_lock(&tdx_global_init_lock); >>>>> + >>>>> + if (tdx_global_init_status & TDX_GLOBAL_INIT_DONE) { >>>>> + ret = tdx_global_init_status & TDX_GLOBAL_INIT_FAILED ? >>>>> + -EINVAL : 0; >>>>> + goto out; >>>>> + } >>>>> + >>>>> + /* All '0's are just unused parameters. */ >>>>> + ret = seamcall(TDH_SYS_INIT, 0, 0, 0, 0, NULL, NULL); >>>>> + >>>>> + tdx_global_init_status = TDX_GLOBAL_INIT_DONE; >>>>> + if (ret) >>>>> + tdx_global_init_status |= TDX_GLOBAL_INIT_FAILED; >>>> >>>> If entropy is lacking (rdrand failure), TDH_SYS_INIT can return TDX_SYS_BUSY. >>>> In such case, we should allow the caller to retry or make this function retry >>>> instead of marking error stickily. >>> >>> The spec says: >>> >>> TDX_SYS_BUSY The operation was invoked when another TDX module >>> operation was in progress. The operation may be retried. >>> >>> So I don't see how entropy is lacking is related to this error. Perhaps you >>> were mixing up with KEY.CONFIG? >> >> TDH.SYS.INIT() initializes global canary value. TDX module is compiled with >> strong stack protector enabled by clang and canary value needs to be >> initialized. By default, the canary value is stored at >> %fsbase:<STACK_CANARY_OFFSET 0x28> >> >> Although this is a job for libc or language runtime, TDX modules has to do it >> itself because it's stand alone. >> >> From tdh_sys_init.c >> _STATIC_INLINE_ api_error_type tdx_init_stack_canary(void) >> { >> ia32_rflags_t rflags = {.raw = 0}; >> uint64_t canary; >> if (!ia32_rdrand(&rflags, &canary)) >> { >> return TDX_SYS_BUSY; >> } >> ... >> last_page_ptr->stack_canary.canary = canary; >> >> > > Then it is a hidden behaviour of the TDX module that is not reflected in the > spec. This is true. Could you please go ask the TDX module folks to fix this up? > I am not sure whether we should handle because: > > 1) This is an extremely rare case. Kernel would be basically under attack if > such error happened. In the current series we don't handle such case in > KEY.CONFIG either but just leave a comment (see patch 13). Rare, yes. Under attack? I'm not sure where you get that from. Look at the SDM: > Under heavy load, with multiple cores executing RDRAND in parallel, it is possible, though unlikely, for the demand > of random numbers by software processes/threads to exceed the rate at which the random number generator > hardware can supply them. This will lead to the RDRAND instruction returning no data transitorily. The RDRAND > instruction indicates the occurrence of this rare situation by clearing the CF flag. That doesn't talk about attacks. > 2) Not sure whether this will be changed in the future. > > So I think we should keep as is. TDX_SYS_BUSY really is missing some nuance. You *REALLY* want to retry RDRAND failures. But, if you have VMM locking and don't expect two users calling into the TDX module then TDX_SYS_BUSY from a busy *module* is a bad (and probably fatal) signal. I suspect we should just throw a few retries in the seamcall() infrastructure to retry in the case of TDX_SYS_BUSY. It'll take care of RDRAND failures. If a retry loop fails to resolve it, then we should probably dump a warning and return an error. Just do this once, in common code.
On Mon, Mar 13, 2023 at 10:45:45PM -0700, Dave Hansen <dave.hansen@intel.com> wrote: > On 3/13/23 21:02, Isaku Yamahata wrote: > >> Then it is a hidden behaviour of the TDX module that is not reflected in the > >> spec. I am not sure whether we should handle because: > >> > >> 1) This is an extremely rare case. Kernel would be basically under attack if > >> such error happened. In the current series we don't handle such case in > >> KEY.CONFIG either but just leave a comment (see patch 13). > >> > >> 2) Not sure whether this will be changed in the future. > >> > >> So I think we should keep as is. > > TDX 1.5 spec introduced TDX_RND_NO_ENTROPY status code. For TDX 1.0, let's > > postpone it to TDX 1.5 activity. > > What the heck does this mean? > > I don't remember seeing any code here that checks for "TDX 1.0" or "TDX > 1.5". That means that this code needs to work with _any_ TDX version. > > Are features being added to new versions that break code written for old > versions? No new feature, but new error code. TDX_RND_NO_ENTROPY, lack of entropy. For TDX 1.0, some APIs return TDX_SYS_BUSY. It can be contention(lock failure) or the lack of entropy. The caller can't distinguish them. For TDX 1.5, they return TDX_RND_NO_ENTROPY instead of TDX_SYS_BUSY in the case of rdrand/rdseed failure. Because both TDX_SYS_BUSY and TDX_RND_NO_ENTROPY are recoverable error (bit 63 error=1, bit 62 non_recoverable=0), the caller can check error bit and non_recoverable bit for retry.
On 3/14/23 10:16, Isaku Yamahata wrote: >>> TDX 1.5 spec introduced TDX_RND_NO_ENTROPY status code. For TDX 1.0, let's >>> postpone it to TDX 1.5 activity. >> What the heck does this mean? >> >> I don't remember seeing any code here that checks for "TDX 1.0" or "TDX >> 1.5". That means that this code needs to work with _any_ TDX version. >> >> Are features being added to new versions that break code written for old >> versions? > No new feature, but new error code. TDX_RND_NO_ENTROPY, lack of entropy. > For TDX 1.0, some APIs return TDX_SYS_BUSY. It can be contention(lock failure) > or the lack of entropy. The caller can't distinguish them. > For TDX 1.5, they return TDX_RND_NO_ENTROPY instead of TDX_SYS_BUSY in the case > of rdrand/rdseed failure. > > Because both TDX_SYS_BUSY and TDX_RND_NO_ENTROPY are recoverable error > (bit 63 error=1, bit 62 non_recoverable=0), the caller can check error bit and > non_recoverable bit for retry. Oh, that's actually really nice. It separates out the "RDRAND is empty" issue from the "the VMM should have had a lock here" issue. For now, let's consider TDX_SYS_BUSY to basically indicate a non-fatal kernel bug: the kernel called TDX in a way that it shouldn't have. We'll treat it in the kernel as non-recoverable. We'll return an error, WARN_ON(), and keep on running. A follow-on patch can add generic TDX_RND_NO_ENTROPY retry support to the seamcall infrastructure.
On Tue, 2023-03-14 at 08:48 -0700, Dave Hansen wrote: > On 3/13/23 18:50, Huang, Kai wrote: > > On Mon, 2023-03-13 at 16:49 -0700, Isaku Yamahata wrote: > > > On Sun, Mar 12, 2023 at 11:08:44PM +0000, > > > "Huang, Kai" <kai.huang@intel.com> wrote: > > > > > > > On Wed, 2023-03-08 at 14:27 -0800, Isaku Yamahata wrote: > > > > > > + > > > > > > +static int try_init_module_global(void) > > > > > > +{ > > > > > > + int ret; > > > > > > + > > > > > > + /* > > > > > > + * The TDX module global initialization only needs to be done > > > > > > + * once on any cpu. > > > > > > + */ > > > > > > + spin_lock(&tdx_global_init_lock); > > > > > > + > > > > > > + if (tdx_global_init_status & TDX_GLOBAL_INIT_DONE) { > > > > > > + ret = tdx_global_init_status & TDX_GLOBAL_INIT_FAILED ? > > > > > > + -EINVAL : 0; > > > > > > + goto out; > > > > > > + } > > > > > > + > > > > > > + /* All '0's are just unused parameters. */ > > > > > > + ret = seamcall(TDH_SYS_INIT, 0, 0, 0, 0, NULL, NULL); > > > > > > + > > > > > > + tdx_global_init_status = TDX_GLOBAL_INIT_DONE; > > > > > > + if (ret) > > > > > > + tdx_global_init_status |= TDX_GLOBAL_INIT_FAILED; > > > > > > > > > > If entropy is lacking (rdrand failure), TDH_SYS_INIT can return TDX_SYS_BUSY. > > > > > In such case, we should allow the caller to retry or make this function retry > > > > > instead of marking error stickily. > > > > > > > > The spec says: > > > > > > > > TDX_SYS_BUSY The operation was invoked when another TDX module > > > > operation was in progress. The operation may be retried. > > > > > > > > So I don't see how entropy is lacking is related to this error. Perhaps you > > > > were mixing up with KEY.CONFIG? > > > > > > TDH.SYS.INIT() initializes global canary value. TDX module is compiled with > > > strong stack protector enabled by clang and canary value needs to be > > > initialized. By default, the canary value is stored at > > > %fsbase:<STACK_CANARY_OFFSET 0x28> > > > > > > Although this is a job for libc or language runtime, TDX modules has to do it > > > itself because it's stand alone. > > > > > > From tdh_sys_init.c > > > _STATIC_INLINE_ api_error_type tdx_init_stack_canary(void) > > > { > > > ia32_rflags_t rflags = {.raw = 0}; > > > uint64_t canary; > > > if (!ia32_rdrand(&rflags, &canary)) > > > { > > > return TDX_SYS_BUSY; > > > } > > > ... > > > last_page_ptr->stack_canary.canary = canary; > > > > > > > > > > Then it is a hidden behaviour of the TDX module that is not reflected in the > > spec. > > This is true. Could you please go ask the TDX module folks to fix this up? Sure will do. To make sure, you mean we should ask TDX module guys to add the new TDX_RND_NO_ENTROPY error code to TDX module 1.0? "another TDX module operation was in progress" and "running out of entropy" are different thing and should not be mixed together IMHO. > > > I am not sure whether we should handle because: > > > > 1) This is an extremely rare case. Kernel would be basically under attack if > > such error happened. In the current series we don't handle such case in > > KEY.CONFIG either but just leave a comment (see patch 13). > > Rare, yes. Under attack? I'm not sure where you get that from. Look > at the SDM: > > > Under heavy load, with multiple cores executing RDRAND in parallel, it is possible, though unlikely, for the demand > > of random numbers by software processes/threads to exceed the rate at which the random number generator > > hardware can supply them. This will lead to the RDRAND instruction returning no data transitorily. The RDRAND > > instruction indicates the occurrence of this rare situation by clearing the CF flag. > > That doesn't talk about attacks. Thanks for citing the documentation. I checked the kernel code before and it seems currently there's no code to call RDRAND very frequently. But yes we should not say "under attack". I have some old memory that someone said so (maybe me?). > > > 2) Not sure whether this will be changed in the future. > > > > So I think we should keep as is. > > TDX_SYS_BUSY really is missing some nuance. You *REALLY* want to retry > RDRAND failures. > OK. Agreed. Then I think the TDH.SYS.KEY.CONFIG should retry when running out of entropy too. > But, if you have VMM locking and don't expect two > users calling into the TDX module then TDX_SYS_BUSY from a busy *module* > is a bad (and probably fatal) signal. Yes we have a lock to protect TDH.SYS.INIT from being called in parallel. W/o this entropy thing TDX_SYS_BUSY should never happen. > > I suspect we should just throw a few retries in the seamcall() > infrastructure to retry in the case of TDX_SYS_BUSY. It'll take care of > RDRAND failures. If a retry loop fails to resolve it, then we should > probably dump a warning and return an error. > > Just do this once, in common code. I can do. Just want to make sure do you want to retry TDX_SYS_BUSY, or retry TDX_RND_NO_ENTROPY (if we want to ask TDX module guys to change to return this value)? Also, even we retry either TDX_SYS_BUSY or TDX_RND_NO_ENTROPY in common seamcall() code, it doesn't handle the TDH.SYS.KEY.CONFIG, because sadly this SEAMCALL returns a different error code: TDX_KEY_GENERATION_FAILED Failed to generate a random key. This is typically caused by an entropy error of the CPU's random number generator, and may be impacted by RDSEED, RDRAND or PCONFIG executing on other LPs. The operation should be retried.
On Tue, Mar 07, 2023 at 03:13:50AM +1300, Kai Huang <kai.huang@intel.com> wrote: > +static int try_init_module_global(void) > +{ > + int ret; > + > + /* > + * The TDX module global initialization only needs to be done > + * once on any cpu. > + */ > + spin_lock(&tdx_global_init_lock); If I use tdx_cpu_enable() via kvm hardware_enable_all(), this function is called in the context IPI callback and the lockdep complains. Here is my patch to address it From 0c4022ffe8cd68dfb455c418eb65538e4e100115 Mon Sep 17 00:00:00 2001 Message-Id: <0c4022ffe8cd68dfb455c418eb65538e4e100115.1678926123.git.isaku.yamahata@intel.com> In-Reply-To: <d2aa2142665b8204b628232ab615c98090371c99.1678926122.git.isaku.yamahata@intel.com> References: <d2aa2142665b8204b628232ab615c98090371c99.1678926122.git.isaku.yamahata@intel.com> From: Isaku Yamahata <isaku.yamahata@intel.com> Date: Wed, 15 Mar 2023 14:26:37 -0700 Subject: [PATCH] x86/virt/vmx/tdx: Use raw spin lock instead of spin lock tdx_cpu_enable() can be called by IPI handler. The lockdep complains about spin lock as follows. Use raw spin lock. ============================= [ BUG: Invalid wait context ] 6.3.0-rc1-tdx-kvm-upstream+ #389 Not tainted ----------------------------- swapper/154/0 is trying to lock: ffffffffa7875e58 (tdx_global_init_lock){....}-{3:3}, at: tdx_cpu_enable+0x67/0x180 other info that might help us debug this: context-{2:2} no locks held by swapper/154/0. stack backtrace: Call Trace: <IRQ> dump_stack_lvl+0x64/0xb0 dump_stack+0x10/0x20 __lock_acquire+0x912/0xc30 lock_acquire.part.0+0x99/0x220 lock_acquire+0x60/0x170 _raw_spin_lock_irqsave+0x43/0x70 tdx_cpu_enable+0x67/0x180 vt_hardware_enable+0x3b/0x60 kvm_arch_hardware_enable+0xe7/0x2e0 hardware_enable_nolock+0x33/0x80 __flush_smp_call_function_queue+0xc4/0x590 generic_smp_call_function_single_interrupt+0x1a/0xb0 __sysvec_call_function+0x48/0x200 sysvec_call_function+0xad/0xd0 </IRQ> Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com> --- arch/x86/virt/vmx/tdx/tdx.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c index 2ee37a5dedcf..e1c8ffad7406 100644 --- a/arch/x86/virt/vmx/tdx/tdx.c +++ b/arch/x86/virt/vmx/tdx/tdx.c @@ -41,7 +41,7 @@ static u32 tdx_guest_keyid_start __ro_after_init; static u32 tdx_nr_guest_keyids __ro_after_init; static unsigned int tdx_global_init_status; -static DEFINE_SPINLOCK(tdx_global_init_lock); +static DEFINE_RAW_SPINLOCK(tdx_global_init_lock); #define TDX_GLOBAL_INIT_DONE _BITUL(0) #define TDX_GLOBAL_INIT_FAILED _BITUL(1) @@ -349,6 +349,7 @@ static void tdx_trace_seamcalls(u64 level) static int try_init_module_global(void) { + unsigned long flags; u64 tsx_ctrl; int ret; @@ -356,7 +357,7 @@ static int try_init_module_global(void) * The TDX module global initialization only needs to be done * once on any cpu. */ - spin_lock(&tdx_global_init_lock); + raw_spin_lock_irqsave(&tdx_global_init_lock, flags); if (tdx_global_init_status & TDX_GLOBAL_INIT_DONE) { ret = tdx_global_init_status & TDX_GLOBAL_INIT_FAILED ? @@ -373,7 +374,7 @@ static int try_init_module_global(void) if (ret) tdx_global_init_status |= TDX_GLOBAL_INIT_FAILED; out: - spin_unlock(&tdx_global_init_lock); + raw_spin_unlock_irqrestore(&tdx_global_init_lock, flags); if (ret) { if (trace_boot_seamcalls)
On Wed, Mar 15, 2023 at 05:31:02PM -0700, Isaku Yamahata <isaku.yamahata@gmail.com> wrote: > On Tue, Mar 07, 2023 at 03:13:50AM +1300, > Kai Huang <kai.huang@intel.com> wrote: > > > +static int try_init_module_global(void) > > +{ > > + int ret; > > + > > + /* > > + * The TDX module global initialization only needs to be done > > + * once on any cpu. > > + */ > > + spin_lock(&tdx_global_init_lock); > > > If I use tdx_cpu_enable() via kvm hardware_enable_all(), this function is called > in the context IPI callback and the lockdep complains. Here is my patch to > address it > > From 0c4022ffe8cd68dfb455c418eb65538e4e100115 Mon Sep 17 00:00:00 2001 > Message-Id: <0c4022ffe8cd68dfb455c418eb65538e4e100115.1678926123.git.isaku.yamahata@intel.com> > In-Reply-To: <d2aa2142665b8204b628232ab615c98090371c99.1678926122.git.isaku.yamahata@intel.com> > References: <d2aa2142665b8204b628232ab615c98090371c99.1678926122.git.isaku.yamahata@intel.com> > From: Isaku Yamahata <isaku.yamahata@intel.com> > Date: Wed, 15 Mar 2023 14:26:37 -0700 > Subject: [PATCH] x86/virt/vmx/tdx: Use raw spin lock instead of spin lock > > tdx_cpu_enable() can be called by IPI handler. The lockdep complains about > spin lock as follows. Use raw spin lock. > > ============================= > [ BUG: Invalid wait context ] > 6.3.0-rc1-tdx-kvm-upstream+ #389 Not tainted > ----------------------------- > swapper/154/0 is trying to lock: > ffffffffa7875e58 (tdx_global_init_lock){....}-{3:3}, at: tdx_cpu_enable+0x67/0x180 > other info that might help us debug this: > context-{2:2} > no locks held by swapper/154/0. > stack backtrace: > Call Trace: > <IRQ> > dump_stack_lvl+0x64/0xb0 > dump_stack+0x10/0x20 > __lock_acquire+0x912/0xc30 > lock_acquire.part.0+0x99/0x220 > lock_acquire+0x60/0x170 > _raw_spin_lock_irqsave+0x43/0x70 > tdx_cpu_enable+0x67/0x180 > vt_hardware_enable+0x3b/0x60 > kvm_arch_hardware_enable+0xe7/0x2e0 > hardware_enable_nolock+0x33/0x80 > __flush_smp_call_function_queue+0xc4/0x590 > generic_smp_call_function_single_interrupt+0x1a/0xb0 > __sysvec_call_function+0x48/0x200 > sysvec_call_function+0xad/0xd0 > </IRQ> > > Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com> > --- > arch/x86/virt/vmx/tdx/tdx.c | 7 ++++--- > 1 file changed, 4 insertions(+), 3 deletions(-) > > diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c > index 2ee37a5dedcf..e1c8ffad7406 100644 > --- a/arch/x86/virt/vmx/tdx/tdx.c > +++ b/arch/x86/virt/vmx/tdx/tdx.c > @@ -41,7 +41,7 @@ static u32 tdx_guest_keyid_start __ro_after_init; > static u32 tdx_nr_guest_keyids __ro_after_init; > > static unsigned int tdx_global_init_status; > -static DEFINE_SPINLOCK(tdx_global_init_lock); > +static DEFINE_RAW_SPINLOCK(tdx_global_init_lock); > #define TDX_GLOBAL_INIT_DONE _BITUL(0) > #define TDX_GLOBAL_INIT_FAILED _BITUL(1) > > @@ -349,6 +349,7 @@ static void tdx_trace_seamcalls(u64 level) > > static int try_init_module_global(void) > { > + unsigned long flags; > u64 tsx_ctrl; > int ret; > > @@ -356,7 +357,7 @@ static int try_init_module_global(void) > * The TDX module global initialization only needs to be done > * once on any cpu. > */ > - spin_lock(&tdx_global_init_lock); > + raw_spin_lock_irqsave(&tdx_global_init_lock, flags); As hardware_enable_all() uses cpus_read_lock(), irqsave isn't needed. this line should be raw_spin_lock().
> > > > @@ -356,7 +357,7 @@ static int try_init_module_global(void) > > * The TDX module global initialization only needs to be done > > * once on any cpu. > > */ > > - spin_lock(&tdx_global_init_lock); > > + raw_spin_lock_irqsave(&tdx_global_init_lock, flags); > > As hardware_enable_all() uses cpus_read_lock(), irqsave isn't needed. > this line should be raw_spin_lock(). > OK. I missed that in PREEMPT_RT kernel the spinlock is converted to sleeping lock. So I'll change to use raw_spin_lock() as we talked. Thanks.
On Wed, 2023-03-15 at 11:10 +0000, Huang, Kai wrote: > On Tue, 2023-03-14 at 08:48 -0700, Dave Hansen wrote: > > On 3/13/23 18:50, Huang, Kai wrote: > > > On Mon, 2023-03-13 at 16:49 -0700, Isaku Yamahata wrote: > > > > On Sun, Mar 12, 2023 at 11:08:44PM +0000, > > > > "Huang, Kai" <kai.huang@intel.com> wrote: > > > > > > > > > On Wed, 2023-03-08 at 14:27 -0800, Isaku Yamahata wrote: > > > > > > > + > > > > > > > +static int try_init_module_global(void) > > > > > > > +{ > > > > > > > + int ret; > > > > > > > + > > > > > > > + /* > > > > > > > + * The TDX module global initialization only needs to be done > > > > > > > + * once on any cpu. > > > > > > > + */ > > > > > > > + spin_lock(&tdx_global_init_lock); > > > > > > > + > > > > > > > + if (tdx_global_init_status & TDX_GLOBAL_INIT_DONE) { > > > > > > > + ret = tdx_global_init_status & TDX_GLOBAL_INIT_FAILED ? > > > > > > > + -EINVAL : 0; > > > > > > > + goto out; > > > > > > > + } > > > > > > > + > > > > > > > + /* All '0's are just unused parameters. */ > > > > > > > + ret = seamcall(TDH_SYS_INIT, 0, 0, 0, 0, NULL, NULL); > > > > > > > + > > > > > > > + tdx_global_init_status = TDX_GLOBAL_INIT_DONE; > > > > > > > + if (ret) > > > > > > > + tdx_global_init_status |= TDX_GLOBAL_INIT_FAILED; > > > > > > > > > > > > If entropy is lacking (rdrand failure), TDH_SYS_INIT can return TDX_SYS_BUSY. > > > > > > In such case, we should allow the caller to retry or make this function retry > > > > > > instead of marking error stickily. > > > > > > > > > > The spec says: > > > > > > > > > > TDX_SYS_BUSY The operation was invoked when another TDX module > > > > > operation was in progress. The operation may be retried. > > > > > > > > > > So I don't see how entropy is lacking is related to this error. Perhaps you > > > > > were mixing up with KEY.CONFIG? > > > > > > > > TDH.SYS.INIT() initializes global canary value. TDX module is compiled with > > > > strong stack protector enabled by clang and canary value needs to be > > > > initialized. By default, the canary value is stored at > > > > %fsbase:<STACK_CANARY_OFFSET 0x28> > > > > > > > > Although this is a job for libc or language runtime, TDX modules has to do it > > > > itself because it's stand alone. > > > > > > > > From tdh_sys_init.c > > > > _STATIC_INLINE_ api_error_type tdx_init_stack_canary(void) > > > > { > > > > ia32_rflags_t rflags = {.raw = 0}; > > > > uint64_t canary; > > > > if (!ia32_rdrand(&rflags, &canary)) > > > > { > > > > return TDX_SYS_BUSY; > > > > } > > > > ... > > > > last_page_ptr->stack_canary.canary = canary; > > > > > > > > > > > > > > Then it is a hidden behaviour of the TDX module that is not reflected in the > > > spec. > > > > This is true. Could you please go ask the TDX module folks to fix this up? > > Sure will do. > > To make sure, you mean we should ask TDX module guys to add the new > TDX_RND_NO_ENTROPY error code to TDX module 1.0? > > "another TDX module operation was in progress" and "running out of entropy" are > different thing and should not be mixed together IMHO. > > > > > > I am not sure whether we should handle because: > > > > > > 1) This is an extremely rare case. Kernel would be basically under attack if > > > such error happened. In the current series we don't handle such case in > > > KEY.CONFIG either but just leave a comment (see patch 13). > > > > Rare, yes. Under attack? I'm not sure where you get that from. Look > > at the SDM: > > > > > Under heavy load, with multiple cores executing RDRAND in parallel, it is possible, though unlikely, for the demand > > > of random numbers by software processes/threads to exceed the rate at which the random number generator > > > hardware can supply them. This will lead to the RDRAND instruction returning no data transitorily. The RDRAND > > > instruction indicates the occurrence of this rare situation by clearing the CF flag. > > > > That doesn't talk about attacks. > > Thanks for citing the documentation. I checked the kernel code before and it > seems currently there's no code to call RDRAND very frequently. But yes we > should not say "under attack". I have some old memory that someone said so > (maybe me?). > > > > > > 2) Not sure whether this will be changed in the future. > > > > > > So I think we should keep as is. > > > > TDX_SYS_BUSY really is missing some nuance. You *REALLY* want to retry > > RDRAND failures. > > > > OK. Agreed. Then I think the TDH.SYS.KEY.CONFIG should retry when running out > of entropy too. > > > But, if you have VMM locking and don't expect two > > users calling into the TDX module then TDX_SYS_BUSY from a busy *module* > > is a bad (and probably fatal) signal. > > Yes we have a lock to protect TDH.SYS.INIT from being called in parallel. W/o > this entropy thing TDX_SYS_BUSY should never happen. > > > > > I suspect we should just throw a few retries in the seamcall() > > infrastructure to retry in the case of TDX_SYS_BUSY. It'll take care of > > RDRAND failures. If a retry loop fails to resolve it, then we should > > probably dump a warning and return an error. > > > > Just do this once, in common code. > > I can do. Just want to make sure do you want to retry TDX_SYS_BUSY, or retry > TDX_RND_NO_ENTROPY (if we want to ask TDX module guys to change to return this > value)? > > Also, even we retry either TDX_SYS_BUSY or TDX_RND_NO_ENTROPY in common > seamcall() code, it doesn't handle the TDH.SYS.KEY.CONFIG, because sadly this > SEAMCALL returns a different error code: > > TDX_KEY_GENERATION_FAILED Failed to generate a random key. This is > typically caused by an entropy error of the > CPU's random number generator, and may > be impacted by RDSEED, RDRAND or PCONFIG > executing on other LPs. The operation should be > retried. > Hi Dave, Sorry to ping. Could you help to check whether my understanding is aligned with what you suggested?
On 3/15/23 04:10, Huang, Kai wrote: > I can do. Just want to make sure do you want to retry TDX_SYS_BUSY, or retry > TDX_RND_NO_ENTROPY (if we want to ask TDX module guys to change to return this > value)? I'll put it this way: Linux is going to treat TDX_SYS_BUSY like a Linux bug and assume Linux is doing something wrong. It'll mostly mean that users will see something nasty and may even cause Linux to give up on TDX. In other words, the TDX module shouldn't use TDX_SYS_BUSY for things that aren't Linux's fault. > Also, even we retry either TDX_SYS_BUSY or TDX_RND_NO_ENTROPY in common > seamcall() code, it doesn't handle the TDH.SYS.KEY.CONFIG, because sadly this > SEAMCALL returns a different error code: > > TDX_KEY_GENERATION_FAILED Failed to generate a random key. This is > typically caused by an entropy error of the > CPU's random number generator, and may > be impacted by RDSEED, RDRAND or PCONFIG > executing on other LPs. The operation should be > retried. Sounds like we should just replace TDX_KEY_GENERATION_FAILED with TDX_RND_NO_ENTROPY in cases where key generation fails because of a lack of entropy.
On Thu, 2023-03-23 at 06:49 -0700, Hansen, Dave wrote: > On 3/15/23 04:10, Huang, Kai wrote: > > I can do. Just want to make sure do you want to retry TDX_SYS_BUSY, or retry > > TDX_RND_NO_ENTROPY (if we want to ask TDX module guys to change to return this > > value)? > > I'll put it this way: > > Linux is going to treat TDX_SYS_BUSY like a Linux bug and assume > Linux is doing something wrong. It'll mostly mean that > users will see something nasty and may even cause Linux to give > up on TDX. In other words, the TDX module shouldn't use > TDX_SYS_BUSY for things that aren't Linux's fault. > > > Also, even we retry either TDX_SYS_BUSY or TDX_RND_NO_ENTROPY in common > > seamcall() code, it doesn't handle the TDH.SYS.KEY.CONFIG, because sadly this > > SEAMCALL returns a different error code: > > > > TDX_KEY_GENERATION_FAILED Failed to generate a random key. This is > > typically caused by an entropy error of the > > CPU's random number generator, and may > > be impacted by RDSEED, RDRAND or PCONFIG > > executing on other LPs. The operation should be > > retried. > > Sounds like we should just replace TDX_KEY_GENERATION_FAILED with > TDX_RND_NO_ENTROPY in cases where key generation fails because of a lack > of entropy. Thanks for feedback. I'll do following, please let me know for any comments in case I have any misunderstanding. 1) In TDH.SYS.INIT, ask TDX module team to return TDX_RND_NO_ENTROPY instead of TDX_SYS_BUSY when running out of entropy. 2) In TDH.SYS.KEY.CONFIG, ask TDX module to return TDX_RND_NO_ENTROPY instead of TDX_KEY_GENERATION_FAILED when running out of entropy. Whether TDX_KEY_GENERATION_FAILED should be still kept is up to TDX module team (because it looks running concurrent PCONFIGs is also related). 3) Ask TDX module to always return TDX_RND_NO_ENTROPY in _ALL_ SEAMCALLs and keep this behaviour for future TDX modules too. 4) In the common seamcall(), retry on TDX_RND_NO_ENTROPY. In terms of how many times to retry, I will use a fixed value for now, similar to the kernel code below: #define RDRAND_RETRY_LOOPS 10 /* Unconditional execution of RDRAND and RDSEED */ static inline bool __must_check rdrand_long(unsigned long *v) { bool ok; unsigned int retry = RDRAND_RETRY_LOOPS; do { asm volatile("rdrand %[out]" CC_SET(c) : CC_OUT(c) (ok), [out] "=r" (*v)); if (ok) return true; } while (--retry); return false; }
On 3/23/23 15:09, Huang, Kai wrote: > 1) In TDH.SYS.INIT, ask TDX module team to return TDX_RND_NO_ENTROPY instead of > TDX_SYS_BUSY when running out of entropy. > > 2) In TDH.SYS.KEY.CONFIG, ask TDX module to return TDX_RND_NO_ENTROPY instead of > TDX_KEY_GENERATION_FAILED when running out of entropy. Whether > TDX_KEY_GENERATION_FAILED should be still kept is up to TDX module team > (because it looks running concurrent PCONFIGs is also related). > > 3) Ask TDX module to always return TDX_RND_NO_ENTROPY in _ALL_ SEAMCALLs and > keep this behaviour for future TDX modules too. Yes, that's all fine. > 4) In the common seamcall(), retry on TDX_RND_NO_ENTROPY. > > In terms of how many times to retry, I will use a fixed value for now, similar > to the kernel code below: > > #define RDRAND_RETRY_LOOPS 10 Heck, you could even just use RDRAND_RETRY_LOOPS directly. It's hard(er) to bikeshed your choice of a random number that you didn't even pick.
On Thu, 2023-03-23 at 15:12 -0700, Hansen, Dave wrote: > On 3/23/23 15:09, Huang, Kai wrote: > > 1) In TDH.SYS.INIT, ask TDX module team to return TDX_RND_NO_ENTROPY instead of > > TDX_SYS_BUSY when running out of entropy. > > > > 2) In TDH.SYS.KEY.CONFIG, ask TDX module to return TDX_RND_NO_ENTROPY instead of > > TDX_KEY_GENERATION_FAILED when running out of entropy. Whether > > TDX_KEY_GENERATION_FAILED should be still kept is up to TDX module team > > (because it looks running concurrent PCONFIGs is also related). > > > > 3) Ask TDX module to always return TDX_RND_NO_ENTROPY in _ALL_ SEAMCALLs and > > keep this behaviour for future TDX modules too. > > Yes, that's all fine. > > > 4) In the common seamcall(), retry on TDX_RND_NO_ENTROPY. > > > > In terms of how many times to retry, I will use a fixed value for now, similar > > to the kernel code below: > > > > #define RDRAND_RETRY_LOOPS 10 > > Heck, you could even just use RDRAND_RETRY_LOOPS directly. It's > hard(er) to bikeshed your choice of a random number that you didn't even > pick. Yes I'll just include the header and use it. Thanks.
diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h index b489b5b9de5d..112a5b9bd5cd 100644 --- a/arch/x86/include/asm/tdx.h +++ b/arch/x86/include/asm/tdx.h @@ -102,8 +102,12 @@ static inline long tdx_kvm_hypercall(unsigned int nr, unsigned long p1, #ifdef CONFIG_INTEL_TDX_HOST bool platform_tdx_enabled(void); +int tdx_cpu_enable(void); +int tdx_enable(void); #else /* !CONFIG_INTEL_TDX_HOST */ static inline bool platform_tdx_enabled(void) { return false; } +static inline int tdx_cpu_enable(void) { return -EINVAL; } +static inline int tdx_enable(void) { return -EINVAL; } #endif /* CONFIG_INTEL_TDX_HOST */ #endif /* !__ASSEMBLY__ */ diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c index b65b838f3b5d..29127cb70f51 100644 --- a/arch/x86/virt/vmx/tdx/tdx.c +++ b/arch/x86/virt/vmx/tdx/tdx.c @@ -13,6 +13,10 @@ #include <linux/errno.h> #include <linux/printk.h> #include <linux/smp.h> +#include <linux/cpu.h> +#include <linux/spinlock.h> +#include <linux/percpu-defs.h> +#include <linux/mutex.h> #include <asm/msr-index.h> #include <asm/msr.h> #include <asm/tdx.h> @@ -22,6 +26,18 @@ static u32 tdx_global_keyid __ro_after_init; static u32 tdx_guest_keyid_start __ro_after_init; static u32 tdx_nr_guest_keyids __ro_after_init; +static unsigned int tdx_global_init_status; +static DEFINE_SPINLOCK(tdx_global_init_lock); +#define TDX_GLOBAL_INIT_DONE _BITUL(0) +#define TDX_GLOBAL_INIT_FAILED _BITUL(1) + +static DEFINE_PER_CPU(unsigned int, tdx_lp_init_status); +#define TDX_LP_INIT_DONE _BITUL(0) +#define TDX_LP_INIT_FAILED _BITUL(1) + +static enum tdx_module_status_t tdx_module_status; +static DEFINE_MUTEX(tdx_module_lock); + /* * Use tdx_global_keyid to indicate that TDX is uninitialized. * This is used in TDX initialization error paths to take it from @@ -159,3 +175,169 @@ static int __always_unused seamcall(u64 fn, u64 rcx, u64 rdx, u64 r8, u64 r9, put_cpu(); return ret; } + +static int try_init_module_global(void) +{ + int ret; + + /* + * The TDX module global initialization only needs to be done + * once on any cpu. + */ + spin_lock(&tdx_global_init_lock); + + if (tdx_global_init_status & TDX_GLOBAL_INIT_DONE) { + ret = tdx_global_init_status & TDX_GLOBAL_INIT_FAILED ? + -EINVAL : 0; + goto out; + } + + /* All '0's are just unused parameters. */ + ret = seamcall(TDH_SYS_INIT, 0, 0, 0, 0, NULL, NULL); + + tdx_global_init_status = TDX_GLOBAL_INIT_DONE; + if (ret) + tdx_global_init_status |= TDX_GLOBAL_INIT_FAILED; +out: + spin_unlock(&tdx_global_init_lock); + + return ret; +} + +/** + * tdx_cpu_enable - Enable TDX on local cpu + * + * Do one-time TDX module per-cpu initialization SEAMCALL (and TDX module + * global initialization SEAMCALL if not done) on local cpu to make this + * cpu be ready to run any other SEAMCALLs. + * + * Note this function must be called when preemption is not possible + * (i.e. via SMP call or in per-cpu thread). It is not IRQ safe either + * (i.e. cannot be called in per-cpu thread and via SMP call from remote + * cpu simultaneously). + * + * Return 0 on success, otherwise errors. + */ +int tdx_cpu_enable(void) +{ + unsigned int lp_status; + int ret; + + if (!platform_tdx_enabled()) + return -EINVAL; + + lp_status = __this_cpu_read(tdx_lp_init_status); + + /* Already done */ + if (lp_status & TDX_LP_INIT_DONE) + return lp_status & TDX_LP_INIT_FAILED ? -EINVAL : 0; + + /* + * The TDX module global initialization is the very first step + * to enable TDX. Need to do it first (if hasn't been done) + * before doing the per-cpu initialization. + */ + ret = try_init_module_global(); + + /* + * If the module global initialization failed, there's no point + * to do the per-cpu initialization. Just mark it as done but + * failed. + */ + if (ret) + goto update_status; + + /* All '0's are just unused parameters */ + ret = seamcall(TDH_SYS_LP_INIT, 0, 0, 0, 0, NULL, NULL); + +update_status: + lp_status = TDX_LP_INIT_DONE; + if (ret) + lp_status |= TDX_LP_INIT_FAILED; + + this_cpu_write(tdx_lp_init_status, lp_status); + + return ret; +} +EXPORT_SYMBOL_GPL(tdx_cpu_enable); + +static int init_tdx_module(void) +{ + /* + * TODO: + * + * - Get TDX module information and TDX-capable memory regions. + * - Build the list of TDX-usable memory regions. + * - Construct a list of "TD Memory Regions" (TDMRs) to cover + * all TDX-usable memory regions. + * - Configure the TDMRs and the global KeyID to the TDX module. + * - Configure the global KeyID on all packages. + * - Initialize all TDMRs. + * + * Return error before all steps are done. + */ + return -EINVAL; +} + +static int __tdx_enable(void) +{ + int ret; + + ret = init_tdx_module(); + if (ret) { + pr_err("TDX module initialization failed (%d)\n", ret); + tdx_module_status = TDX_MODULE_ERROR; + /* + * Just return one universal error code. + * For now the caller cannot recover anyway. + */ + return -EINVAL; + } + + pr_info("TDX module initialized.\n"); + tdx_module_status = TDX_MODULE_INITIALIZED; + + return 0; +} + +/** + * tdx_enable - Enable TDX module to make it ready to run TDX guests + * + * This function assumes the caller has: 1) held read lock of CPU hotplug + * lock to prevent any new cpu from becoming online; 2) done both VMXON + * and tdx_cpu_enable() on all online cpus. + * + * This function can be called in parallel by multiple callers. + * + * Return 0 if TDX is enabled successfully, otherwise error. + */ +int tdx_enable(void) +{ + int ret; + + if (!platform_tdx_enabled()) + return -EINVAL; + + lockdep_assert_cpus_held(); + + mutex_lock(&tdx_module_lock); + + switch (tdx_module_status) { + case TDX_MODULE_UNKNOWN: + ret = __tdx_enable(); + break; + case TDX_MODULE_INITIALIZED: + /* Already initialized, great, tell the caller. */ + ret = 0; + break; + default: + /* Failed to initialize in the previous attempts */ + ret = -EINVAL; + break; + } + + mutex_unlock(&tdx_module_lock); + + return ret; +} +EXPORT_SYMBOL_GPL(tdx_enable); diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h index 48ad1a1ba737..4d6220e86ccf 100644 --- a/arch/x86/virt/vmx/tdx/tdx.h +++ b/arch/x86/virt/vmx/tdx/tdx.h @@ -4,6 +4,31 @@ #include <linux/types.h> +/* + * This file contains both macros and data structures defined by the TDX + * architecture and Linux defined software data structures and functions. + * The two should not be mixed together for better readability. The + * architectural definitions come first. + */ + +/* + * TDX module SEAMCALL leaf functions + */ +#define TDH_SYS_INIT 33 +#define TDH_SYS_LP_INIT 35 + +/* + * Do not put any hardware-defined TDX structure representations below + * this comment! + */ + +/* Kernel defined TDX module status during module initialization. */ +enum tdx_module_status_t { + TDX_MODULE_UNKNOWN, + TDX_MODULE_INITIALIZED, + TDX_MODULE_ERROR +}; + struct tdx_module_output; u64 __seamcall(u64 fn, u64 rcx, u64 rdx, u64 r8, u64 r9, struct tdx_module_output *out);
To enable TDX the kernel needs to initialize TDX from two perspectives: 1) Do a set of SEAMCALLs to initialize the TDX module to make it ready to create and run TDX guests; 2) Do the per-cpu initialization SEAMCALL on one logical cpu before the kernel wants to make any other SEAMCALLs on that cpu (including those involved during module initialization and running TDX guests). The TDX module can be initialized only once in its lifetime. Instead of always initializing it at boot time, this implementation chooses an "on demand" approach to initialize TDX until there is a real need (e.g when requested by KVM). This approach has below pros: 1) It avoids consuming the memory that must be allocated by kernel and given to the TDX module as metadata (~1/256th of the TDX-usable memory), and also saves the CPU cycles of initializing the TDX module (and the metadata) when TDX is not used at all. 2) The TDX module design allows it to be updated while the system is running. The update procedure shares quite a few steps with this "on demand" initialization mechanism. The hope is that much of "on demand" mechanism can be shared with a future "update" mechanism. A boot-time TDX module implementation would not be able to share much code with the update mechanism. 3) Making SEAMCALL requires VMX to be enabled. Currently, only the KVM code mucks with VMX enabling. If the TDX module were to be initialized separately from KVM (like at boot), the boot code would need to be taught how to muck with VMX enabling and KVM would need to be taught how to cope with that. Making KVM itself responsible for TDX initialization lets the rest of the kernel stay blissfully unaware of VMX. Similar to module initialization, also make the per-cpu initialization "on demand" as it also depends on VMX to be enabled. Add two functions, tdx_enable() and tdx_cpu_enable(), to enable the TDX module and enable TDX on local cpu respectively. For now tdx_enable() is a placeholder. The TODO list will be pared down as functionality is added. In tdx_enable() use a state machine protected by mutex to make sure the initialization will only be done once, as tdx_enable() can be called multiple times (i.e. KVM module can be reloaded) and may be called concurrently by other kernel components in the future. The per-cpu initialization on each cpu can only be done once during the module's life time. Use a per-cpu variable to track its status to make sure it is only done once in tdx_cpu_enable(). Also, a SEAMCALL to do TDX module global initialization must be done once on any logical cpu before any per-cpu initialization SEAMCALL. Do it inside tdx_cpu_enable() too (if hasn't been done). tdx_enable() can potentially invoke SEAMCALLs on any online cpus. The per-cpu initialization must be done before those SEAMCALLs are invoked on some cpu. To keep things simple, in tdx_cpu_enable(), always do the per-cpu initialization regardless of whether the TDX module has been initialized or not. And in tdx_enable(), don't call tdx_cpu_enable() but assume the caller has disabled CPU hotplug and done VMXON and tdx_cpu_enable() on all online cpus before calling tdx_enable(). Signed-off-by: Kai Huang <kai.huang@intel.com> --- v9 -> v10: - Merged the patch to handle per-cpu initialization to this patch to tell the story better. - Changed how to handle the per-cpu initialization to only provide a tdx_cpu_enable() function to let the user of TDX to do it when the user wants to run TDX code on a certain cpu. - Changed tdx_enable() to not call cpus_read_lock() explicitly, but call lockdep_assert_cpus_held() to assume the caller has done that. - Improved comments around tdx_enable() and tdx_cpu_enable(). - Improved changelog to tell the story better accordingly. v8 -> v9: - Removed detailed TODO list in the changelog (Dave). - Added back steps to do module global initialization and per-cpu initialization in the TODO list comment. - Moved the 'enum tdx_module_status_t' from tdx.c to local tdx.h v7 -> v8: - Refined changelog (Dave). - Removed "all BIOS-enabled cpus" related code (Peter/Thomas/Dave). - Add a "TODO list" comment in init_tdx_module() to list all steps of initializing the TDX Module to tell the story (Dave). - Made tdx_enable() unverisally return -EINVAL, and removed nonsense comments (Dave). - Simplified __tdx_enable() to only handle success or failure. - TDX_MODULE_SHUTDOWN -> TDX_MODULE_ERROR - Removed TDX_MODULE_NONE (not loaded) as it is not necessary. - Improved comments (Dave). - Pointed out 'tdx_module_status' is software thing (Dave). v6 -> v7: - No change. v5 -> v6: - Added code to set status to TDX_MODULE_NONE if TDX module is not loaded (Chao) - Added Chao's Reviewed-by. - Improved comments around cpus_read_lock(). - v3->v5 (no feedback on v4): - Removed the check that SEAMRR and TDX KeyID have been detected on all present cpus. - Removed tdx_detect(). - Added num_online_cpus() to MADT-enabled CPUs check within the CPU hotplug lock and return early with error message. - Improved dmesg printing for TDX module detection and initialization. --- arch/x86/include/asm/tdx.h | 4 + arch/x86/virt/vmx/tdx/tdx.c | 182 ++++++++++++++++++++++++++++++++++++ arch/x86/virt/vmx/tdx/tdx.h | 25 +++++ 3 files changed, 211 insertions(+)