Message ID | 1fa1eb80238dc19b4c732706b40604169316eb34.1692962263.git.kai.huang@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | TDX host kernel support | expand |
On Sat, 2023-08-26 at 00:14 +1200, Kai Huang wrote: > There are two problems in terms of using kexec() to boot to a new > kernel > when the old kernel has enabled TDX: 1) Part of the memory pages are > still TDX private pages; 2) There might be dirty cachelines > associated > with TDX private pages. Does TDX support hibernate? I'm wondering about two potential problems: 1. Reading/writing private pages from the direct map on save/restore 2. The seam module needing to be re-inited (the tdx_enable() stuff) If that's the case you could have something like the below to just block it when TDX could be in use: diff --git a/kernel/power/hibernate.c b/kernel/power/hibernate.c index 2b4a946a6ff5..3b1b7202452d 100644 --- a/kernel/power/hibernate.c +++ b/kernel/power/hibernate.c @@ -84,7 +84,8 @@ bool hibernation_available(void) { return nohibernate == 0 && !security_locked_down(LOCKDOWN_HIBERNATION) && - !secretmem_active() && !cxl_mem_active(); + !secretmem_active() && !cxl_mem_active() && + !platform_tdx_enabled(); } /** Or maybe better, it could check tdx_module_status? But there is no way to read that variable from hibernate.
On 9/15/23 10:43, Edgecombe, Rick P wrote: > On Sat, 2023-08-26 at 00:14 +1200, Kai Huang wrote: >> There are two problems in terms of using kexec() to boot to a new >> kernel when the old kernel has enabled TDX: 1) Part of the memory >> pages are still TDX private pages; 2) There might be dirty >> cachelines associated with TDX private pages. > Does TDX support hibernate? No. There's a whole bunch of volatile state that's generated inside the CPU and never leaves the CPU, like the ephemeral key that protects TDX module memory. SGX, for instance, never even supported suspend, IIRC. Enclaves just die and have to be rebuilt.
On Fri, 2023-09-15 at 10:50 -0700, Dave Hansen wrote: > On 9/15/23 10:43, Edgecombe, Rick P wrote: > > On Sat, 2023-08-26 at 00:14 +1200, Kai Huang wrote: > > > There are two problems in terms of using kexec() to boot to a new > > > kernel when the old kernel has enabled TDX: 1) Part of the memory > > > pages are still TDX private pages; 2) There might be dirty > > > cachelines associated with TDX private pages. > > Does TDX support hibernate? > No. > > There's a whole bunch of volatile state that's generated inside the CPU > and never leaves the CPU, like the ephemeral key that protects TDX > module memory. > > SGX, for instance, never even supported suspend, IIRC. Enclaves just > die and have to be rebuilt. Right. AFAICT TDX cannot survive from S3 either. All TDX keys get lost when system enters S3. However I don't think TDX can be rebuilt after resume like SGX. Let me confirm with TDX guys on this. I think we can register syscore_ops->suspend for TDX, and refuse to suspend when TDX is enabled. This covers hibernate case too. In terms of how to check "TDX is enabled", ideally it's better to check whether TDX module is actually initialized, but the worst case is we can use platform_tdx_enabled(). (I need to think more on this) Hi Dave, Kirill, Rick, Is this solution overall acceptable?
On 9/18/23 05:08, Huang, Kai wrote: > On Fri, 2023-09-15 at 10:50 -0700, Dave Hansen wrote: >> On 9/15/23 10:43, Edgecombe, Rick P wrote: >>> On Sat, 2023-08-26 at 00:14 +1200, Kai Huang wrote: >>>> There are two problems in terms of using kexec() to boot to a new >>>> kernel when the old kernel has enabled TDX: 1) Part of the memory >>>> pages are still TDX private pages; 2) There might be dirty >>>> cachelines associated with TDX private pages. >>> Does TDX support hibernate? >> No. >> >> There's a whole bunch of volatile state that's generated inside the CPU >> and never leaves the CPU, like the ephemeral key that protects TDX >> module memory. >> >> SGX, for instance, never even supported suspend, IIRC. Enclaves just >> die and have to be rebuilt. > > Right. AFAICT TDX cannot survive from S3 either. All TDX keys get lost when > system enters S3. However I don't think TDX can be rebuilt after resume like > SGX. Let me confirm with TDX guys on this. By "rebuilt" I mean all private data is totally destroyed and rebuilt from scratch. The SGX architecture provides zero help other than delivering a fault and saying: "whoops all your data is gone". > I think we can register syscore_ops->suspend for TDX, and refuse to suspend when > TDX is enabled. This covers hibernate case too. > > In terms of how to check "TDX is enabled", ideally it's better to check whether > TDX module is actually initialized, but the worst case is we can use > platform_tdx_enabled(). (I need to think more on this) *Ideally* the firmware would have a choke point where it could just tell the OS that it can't suspend rather than the OS having to figure it out.
On Mon, 2023-09-18 at 08:44 -0700, Dave Hansen wrote: > On 9/18/23 05:08, Huang, Kai wrote: > > On Fri, 2023-09-15 at 10:50 -0700, Dave Hansen wrote: > > > On 9/15/23 10:43, Edgecombe, Rick P wrote: > > > > On Sat, 2023-08-26 at 00:14 +1200, Kai Huang wrote: > > > > > There are two problems in terms of using kexec() to boot to a new > > > > > kernel when the old kernel has enabled TDX: 1) Part of the memory > > > > > pages are still TDX private pages; 2) There might be dirty > > > > > cachelines associated with TDX private pages. > > > > Does TDX support hibernate? > > > No. > > > > > > There's a whole bunch of volatile state that's generated inside the CPU > > > and never leaves the CPU, like the ephemeral key that protects TDX > > > module memory. > > > > > > SGX, for instance, never even supported suspend, IIRC. Enclaves just > > > die and have to be rebuilt. > > > > Right. AFAICT TDX cannot survive from S3 either. All TDX keys get lost when > > system enters S3. However I don't think TDX can be rebuilt after resume like > > SGX. Let me confirm with TDX guys on this. > > By "rebuilt" I mean all private data is totally destroyed and rebuilt > from scratch. The SGX architecture provides zero help other than > delivering a fault and saying: "whoops all your data is gone". Right. For TDX I am worrying about SEAMCALL could poison memory thus could trigger #MC inside kernel, or even could trigger #MC inside SEAM, instead of delivering a fault that SGX app/kernel can handle. I am confirming with TDX team. > > > I think we can register syscore_ops->suspend for TDX, and refuse to suspend when > > TDX is enabled. This covers hibernate case too. > > > > In terms of how to check "TDX is enabled", ideally it's better to check whether > > TDX module is actually initialized, but the worst case is we can use > > platform_tdx_enabled(). (I need to think more on this) > > *Ideally* the firmware would have a choke point where it could just tell > the OS that it can't suspend rather than the OS having to figure it out. Agreed. Let me ask TDX team about this too.
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c index 72015dba72ab..7e85bd9e5f15 100644 --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c @@ -811,8 +811,14 @@ void __noreturn stop_this_cpu(void *dummy) * * Test the CPUID bit directly because the machine might've cleared * X86_FEATURE_SME due to cmdline options. + * + * The TDX module or guests might have left dirty cachelines + * behind. Flush them to avoid corruption from later writeback. + * Note that this flushes on all systems where TDX is possible, + * but does not actually check that TDX was in use. */ - if (c->extended_cpuid_level >= 0x8000001f && (cpuid_eax(0x8000001f) & BIT(0))) + if ((c->extended_cpuid_level >= 0x8000001f && (cpuid_eax(0x8000001f) & BIT(0))) + || platform_tdx_enabled()) native_wbinvd(); /* diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c index 3adbe97015c1..ae7480a213a6 100644 --- a/arch/x86/kernel/reboot.c +++ b/arch/x86/kernel/reboot.c @@ -32,6 +32,7 @@ #include <asm/realmode.h> #include <asm/x86_init.h> #include <asm/efi.h> +#include <asm/tdx.h> /* * Power off function, if any @@ -695,6 +696,20 @@ void native_machine_shutdown(void) local_irq_disable(); stop_other_cpus(); #endif + /* + * stop_other_cpus() has flushed all dirty cachelines of TDX + * private memory on remote cpus. Unlike SME, which does the + * cache flush on _this_ cpu in the relocate_kernel(), flush + * the cache for _this_ cpu here. This is because on the + * platforms with "partial write machine check" erratum the + * kernel needs to convert all TDX private pages back to normal + * before booting to the new kernel in kexec(), and the cache + * flush must be done before that. If the kernel took SME's way, + * it would have to muck with the relocate_kernel() assembly to + * do memory conversion. + */ + if (platform_tdx_enabled()) + native_wbinvd(); lapic_shutdown(); restore_boot_irq_mode();