From patchwork Mon Mar 6 14:14:00 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Kai" X-Patchwork-Id: 13161230 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 95392C6FA99 for ; Mon, 6 Mar 2023 14:15:46 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3152D280007; Mon, 6 Mar 2023 09:15:46 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 2C669280005; Mon, 6 Mar 2023 09:15:46 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 18DEC280007; Mon, 6 Mar 2023 09:15:46 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 074EB280005 for ; Mon, 6 Mar 2023 09:15:46 -0500 (EST) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id B800DA04A7 for ; Mon, 6 Mar 2023 14:15:45 +0000 (UTC) X-FDA: 80538671850.03.1BE509C Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by imf05.hostedemail.com (Postfix) with ESMTP id 4AB41100025 for ; Mon, 6 Mar 2023 14:15:43 +0000 (UTC) Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b="Cpj7SAb/"; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf05.hostedemail.com: domain of kai.huang@intel.com designates 134.134.136.24 as permitted sender) smtp.mailfrom=kai.huang@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1678112143; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=2/y7OlTMvTv1WIp8WMImwj3dQm+NWkcBzi4UylOENXQ=; b=eSthFKqwcpM3IqTJ38pCXJDu7ZxA/lsNBfhS43A1JV5UAMryjMld+sw3ROfP98ltNiKCqj 5isD6QZzrgWqVhcPQeU17qrBs0y+dNpIidDDNs/U8zdgGbFC2XSAsMUW2GDdMl0aztiHt5 N6EXGp5ZKzjP7Ndkzj5Hz8RAsOC6qac= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b="Cpj7SAb/"; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf05.hostedemail.com: domain of kai.huang@intel.com designates 134.134.136.24 as permitted sender) smtp.mailfrom=kai.huang@intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1678112143; a=rsa-sha256; cv=none; b=ysRCAW1/zELvhgrS23CaPcwNa5B7jHbDiFYfC75hrhSFIb4uOCfw9rM9/mjq4GokkdFFqp DW+I6f5hzLQ36v6k/ibpsWmCYEEQUpf+ccstrXuXjEzOWKpeBhiunv3lCMZV9oKXC0bkQQ I3IcBGEZZNduwQYZECX9rLWsw/L025U= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1678112143; x=1709648143; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=GQvfu8vV/36Z9osC0CkzO5uRKQGb//01eRTjqXMgXtA=; b=Cpj7SAb/6aA6ZA1w7gO1UOjrpPiloQR+S2oa3MJDdQapDcKXo8ACyAhr M0xgm5Hf9jD43CqP2A4Cg5fX4T+XXM2mVAg1r+1zmfmc4c3QC0e1BxSLU 0Ygf2kNyCC4bPYQC3HR2sIqA2uLaQz17pa8TrK1LJUfC3IUs8kt3jAs/4 WmmQKwutgMBI/6oq4aDuYdTNdcsLArJg8dFYn9FmQOtKiVvq7GOEfvu95 X4ZojMRa7u8CRtMw4OtAUTfPr7OVeVvdF8YH1MAUfW4f1Q38SlheNrubw rie3dlYQT4YFZKCrBJX2AIhZHVBWJoVYCpZwhmJCVhEWF7f/rbUQWXuhA Q==; X-IronPort-AV: E=McAfee;i="6500,9779,10641"; a="337080328" X-IronPort-AV: E=Sophos;i="5.98,238,1673942400"; d="scan'208";a="337080328" Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 06 Mar 2023 06:15:42 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6500,9779,10641"; a="765232550" X-IronPort-AV: E=Sophos;i="5.98,238,1673942400"; d="scan'208";a="765232550" Received: from jwhisle1-mobl2.amr.corp.intel.com (HELO khuang2-desk.gar.corp.intel.com) ([10.212.92.57]) by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 06 Mar 2023 06:15:38 -0800 From: Kai Huang To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: linux-mm@kvack.org, dave.hansen@intel.com, peterz@infradead.org, tglx@linutronix.de, seanjc@google.com, pbonzini@redhat.com, dan.j.williams@intel.com, rafael.j.wysocki@intel.com, kirill.shutemov@linux.intel.com, ying.huang@intel.com, reinette.chatre@intel.com, len.brown@intel.com, tony.luck@intel.com, ak@linux.intel.com, isaku.yamahata@intel.com, chao.gao@intel.com, sathyanarayanan.kuppuswamy@linux.intel.com, david@redhat.com, bagasdotme@gmail.com, sagis@google.com, imammedo@redhat.com, kai.huang@intel.com Subject: [PATCH v10 15/16] x86/virt/tdx: Flush cache in kexec() when TDX is enabled Date: Tue, 7 Mar 2023 03:14:00 +1300 Message-Id: <95a37c2f09cbca9d91a858067d309279c714626a.1678111292.git.kai.huang@intel.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: References: MIME-Version: 1.0 X-Rspamd-Queue-Id: 4AB41100025 X-Rspamd-Server: rspam09 X-Rspam-User: X-Stat-Signature: 7if78sgsy4ypfjdm8mqe3f5hk6ee3y5x X-HE-Tag: 1678112143-146234 X-HE-Meta: U2FsdGVkX1/nps7GaD0Lv72BJYAhwIs9zrGjnONg6InEEs/DmEtnco9Avx0sD0G0Sr1tOz0+jg9XWb48WWIdvOfja5QvJ+pqYiwnPvFLGmErHprUS56uQt76pGkfF4apFsq7OPHGS5Gpem2U2FiiIxSXjRuhmHLjZOW6GWWnzWbn1cvYlJ8k+HvAbrM16fnTOktM8BIbL/DXPQawxbcOqAnaPgyCsdxCHkpEIfdwLezpmWEliySk5q2CzzKbcvk2xXRZsu+PqkHdqPWu9n1egefTvniodsFOSZzHDOnAHrbhTvrTTtf5dnTj2nc6jQkhiXs4TBiq4R2Tu80gLPpixdchdNidrCKyQGMRmStEFsogdcdTlORFBuaHIFok94laJcCeaz0E63tgEGKnao1sLuIptFhH6snCw/Yr0Lk9TeX7AVon8zhrWPvv9NnTNpixXchYouPs9CEt0Ceb3/++HFR1hfqpiUOZi1JPMD9T9G74zHzvljpIF7fPWHAHRLwe5gsGrDRrzM6jxbvn5o1/xtklR+QJ5kfUlnA6bIj3gko5nLLxmMy5CePT/R1033mjCnNxh0NQgUrVW9v5n940UHjHVOZ1wDdL6MOJx//FO+rvcqwU6W/uxu2rHEzzQzlmXEz/rdL0g+eNaLiTh0ELdvPnNj7g7C7FxqmAqtWZnHseTwv9s7QXDmVYEulrLC8hzphvyddFa3eFd2jE7lzCF+q0EL84+0JIz9ArFsMuoXDrQySeTx8TmoO0YdR9zy+M6kE1YQpTA+0y6c9zpWkS111OTRRYqUFvufOQo2EDOnYvG8ZS08VclfXelB/aSjMeMrR1RCTPOY0jIWtqqoeztZc7pSa2xwJ6sfxkGGAFZvgMDeam0j46YGSucsN+CTMNV+9LUxvPHoYJF2f2PuM+WIvhAgcvfsroy7kKT2HMkgtyyHmG2tdkf5ciF2XYcVgIGxrvpaR1OTwJxEwZine 35M13P7m BWk72zx+MHggLmeIzzikKcZTcQTRDMQCEgcSBLsGZeN+0HI89HzxWGKa3IoDM4w8vywhCbx7aKLu1HubCpXu5pFuseIRw2laaxhVcqBOzu7wmRCoiG8oVeaOIYfAv02U08p+Vsm8G6AdzWG/qEhLtxFaOhdJzgvmaSx1osudLr2WsWhg= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: There are two problems in terms of using kexec() to boot to a new kernel when the old kernel has enabled TDX: 1) Part of the memory pages are still TDX private pages; 2) There might be dirty cachelines associated with TDX private pages. The first problem doesn't matter. KeyID 0 doesn't have integrity check. Even the new kernel wants to use any non-zero KeyID, it needs to convert the memory to that KeyID and such conversion would work from any KeyID. However the old kernel needs to guarantee there's no dirty cacheline left behind before booting to the new kernel to avoid silent corruption from later cacheline writeback (Intel hardware doesn't guarantee cache coherency across different KeyIDs). There are two things that the old kernel needs to do to achieve that: 1) Stop accessing TDX private memory mappings: a. Stop making TDX module SEAMCALLs (TDX global KeyID); b. Stop TDX guests from running (per-guest TDX KeyID). 2) Flush any cachelines from previous TDX private KeyID writes. For 2), use wbinvd() to flush cache in stop_this_cpu(), following SME support. And in this way 1) happens for free as there's no TDX activity between wbinvd() and the native_halt(). Theoretically, cache flush is only needed when the TDX module has been initialized. However initializing the TDX module is done on demand at runtime, and it takes a mutex to read the module status. Just check whether TDX is enabled by the BIOS instead to flush cache. Signed-off-by: Kai Huang Reviewed-by: Isaku Yamahata --- v9 -> v10: - No change. v8 -> v9: - Various changelog enhancement and fix (Dave). - Improved comment (Dave). v7 -> v8: - Changelog: - Removed "leave TDX module open" part due to shut down patch has been removed. v6 -> v7: - Improved changelog to explain why don't convert TDX private pages back to normal. --- arch/x86/kernel/process.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c index 40d156a31676..5876dda412c7 100644 --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c @@ -765,8 +765,13 @@ void __noreturn stop_this_cpu(void *dummy) * * Test the CPUID bit directly because the machine might've cleared * X86_FEATURE_SME due to cmdline options. + * + * The TDX module or guests might have left dirty cachelines + * behind. Flush them to avoid corruption from later writeback. + * Note that this flushes on all systems where TDX is possible, + * but does not actually check that TDX was in use. */ - if (cpuid_eax(0x8000001f) & BIT(0)) + if (cpuid_eax(0x8000001f) & BIT(0) || platform_tdx_enabled()) native_wbinvd(); for (;;) { /*