Message ID | 17bcbe3e154415ee7a4c77489809a3db0c5ddf3f.1685887183.git.kai.huang@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | TDX host kernel support | expand |
On Mon, Jun 05, 2023 at 02:27:30AM +1200, Kai Huang wrote: > There are two problems in terms of using kexec() to boot to a new kernel > when the old kernel has enabled TDX: 1) Part of the memory pages are > still TDX private pages; 2) There might be dirty cachelines associated > with TDX private pages. > > The first problem doesn't matter on the platforms w/o the "partial write > machine check" erratum. KeyID 0 doesn't have integrity check. If the > new kernel wants to use any non-zero KeyID, it needs to convert the > memory to that KeyID and such conversion would work from any KeyID. > > However the old kernel needs to guarantee there's no dirty cacheline > left behind before booting to the new kernel to avoid silent corruption > from later cacheline writeback (Intel hardware doesn't guarantee cache > coherency across different KeyIDs). > > There are two things that the old kernel needs to do to achieve that: > > 1) Stop accessing TDX private memory mappings: > a. Stop making TDX module SEAMCALLs (TDX global KeyID); > b. Stop TDX guests from running (per-guest TDX KeyID). > 2) Flush any cachelines from previous TDX private KeyID writes. > > For 2), use wbinvd() to flush cache in stop_this_cpu(), following SME > support. And in this way 1) happens for free as there's no TDX activity > between wbinvd() and the native_halt(). > > Flushing cache in stop_this_cpu() only flushes cache on remote cpus. On > the cpu which does kexec(), unlike SME which does the cache flush in > relocate_kernel(), do the cache flush right after stopping remote cpus > in machine_shutdown(). This is because on the platforms with above > erratum, the kernel needs to convert all TDX private pages back to > normal before a fast warm reset reboot or booting to the new kernel in > kexec(). Flushing cache in relocate_kernel() only covers the kexec() > but not the fast warm reset reboot. > > Theoretically, cache flush is only needed when the TDX module has been > initialized. However initializing the TDX module is done on demand at > runtime, and it takes a mutex to read the module status. Just check > whether TDX is enabled by the BIOS instead to flush cache. > > Signed-off-by: Kai Huang <kai.huang@intel.com> > Reviewed-by: Isaku Yamahata <isaku.yamahata@intel.com> Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c index dac41a0072ea..0ce66deb9bc8 100644 --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c @@ -780,8 +780,13 @@ void __noreturn stop_this_cpu(void *dummy) * * Test the CPUID bit directly because the machine might've cleared * X86_FEATURE_SME due to cmdline options. + * + * The TDX module or guests might have left dirty cachelines + * behind. Flush them to avoid corruption from later writeback. + * Note that this flushes on all systems where TDX is possible, + * but does not actually check that TDX was in use. */ - if (cpuid_eax(0x8000001f) & BIT(0)) + if (cpuid_eax(0x8000001f) & BIT(0) || platform_tdx_enabled()) native_wbinvd(); for (;;) { /* diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c index 3adbe97015c1..b3d0e015dae2 100644 --- a/arch/x86/kernel/reboot.c +++ b/arch/x86/kernel/reboot.c @@ -32,6 +32,7 @@ #include <asm/realmode.h> #include <asm/x86_init.h> #include <asm/efi.h> +#include <asm/tdx.h> /* * Power off function, if any @@ -695,6 +696,20 @@ void native_machine_shutdown(void) local_irq_disable(); stop_other_cpus(); #endif + /* + * stop_other_cpus() has flushed all dirty cachelines of TDX + * private memory on remote cpus. Unlike SME, which does the + * cache flush on _this_ cpu in the relocate_kernel(), flush + * the cache for _this_ cpu here. This is because on the + * platforms with "partial write machine check" erratum the + * kernel needs to convert all TDX private pages back to normal + * before a fast warm reset reboot or booting to the new kernel + * in kexec(), and the cache flush must be done before that. + * Flushing cache in relocate_kernel() only covers the kexec() + * but not the fast warm reset reboot. + */ + if (platform_tdx_enabled()) + native_wbinvd(); lapic_shutdown(); restore_boot_irq_mode();