From patchwork Wed Feb 26 19:55:01 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Bonzini X-Patchwork-Id: 13993089 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 05B952571CD for ; Wed, 26 Feb 2025 19:55:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740599742; cv=none; b=g5QeP6Yy1EyuOxVHDd9It2U0/rOMHnE0duMKL839T08ucfevJOMRUve2XBuolrVEZg3I38Adn8mLZL1po+6Ljesieffa2IYwwFwCcHBnjEi1fP0YexEJmPAcNvIbt97uxxoQ7rk6yJlwE7e/03AVhgSxhempWo0ve9MjCes4xzk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740599742; c=relaxed/simple; bh=sYx/6K5vDru02E+Mf4LrVmRr9RA+zqWpw7JOiH3Nn60=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=cbHx6XKMyC40Ph6ebUz75XhIYXrwPNULHxE6D0yUx8WOBNOhxGKrXmc4fbVkmYPzrm4GrwLXG/glkwWfNFa5nskRTa0Thlsyq+GqLjhKaAcLvj9xTWyjoh4bmEBEvSHTbL7ls1fKxK5oAvkLlXnOcoY07MFEG5ZXku7lXzu2oUc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=X8Y8Ib4L; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="X8Y8Ib4L" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1740599740; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=yGMNYwK+DMBvIbrkBLbVZGsRhYd/8K9rSpyWxEelhqc=; b=X8Y8Ib4LlViuBMAlArpT7o4RoK+RN6gfFh2HxnCh8isUVKzaIQf2klyoVBNVGQaIrjkluY bFlJVkHYMbNKV0fOXYlz4t7fICBg8ZV967YdQa8angbBHcHn7wi1DPnY2x4vlwXiBNZ697 xf13vTXM7uXxDQAujI1X8GIIbxdUvew= Received: from mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-287-SC_vgo1MMGONY6MNiHRUsQ-1; Wed, 26 Feb 2025 14:55:34 -0500 X-MC-Unique: SC_vgo1MMGONY6MNiHRUsQ-1 X-Mimecast-MFC-AGG-ID: SC_vgo1MMGONY6MNiHRUsQ_1740599733 Received: from mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.17]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id ECF271800875; Wed, 26 Feb 2025 19:55:32 +0000 (UTC) Received: from virtlab1023.lab.eng.rdu2.redhat.com (virtlab1023.lab.eng.rdu2.redhat.com [10.8.1.187]) by mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 938561944D02; Wed, 26 Feb 2025 19:55:31 +0000 (UTC) From: Paolo Bonzini To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: seanjc@google.com, Yan Zhao , Rick Edgecombe , Isaku Yamahata , Sean Christopherson , Kai Huang , Dave Hansen Subject: [PATCH 01/29] x86/virt/tdx: Add SEAMCALL wrapper tdh_mem_sept_add() to add SEPT pages Date: Wed, 26 Feb 2025 14:55:01 -0500 Message-ID: <20250226195529.2314580-2-pbonzini@redhat.com> In-Reply-To: <20250226195529.2314580-1-pbonzini@redhat.com> References: <20250226195529.2314580-1-pbonzini@redhat.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.0 on 10.30.177.17 From: Isaku Yamahata TDX architecture introduces the concept of private GPA vs shared GPA, depending on the GPA.SHARED bit. The TDX module maintains a Secure EPT (S-EPT or SEPT) tree per TD for private GPA to HPA translation. Wrap the TDH.MEM.SEPT.ADD SEAMCALL with tdh_mem_sept_add() to provide pages to the TDX module for building a TD's SEPT tree. (Refer to these pages as SEPT pages). Callers need to allocate and provide a normal page to tdh_mem_sept_add(), which then passes the page to the TDX module via the SEAMCALL TDH.MEM.SEPT.ADD. The TDX module then installs the page into SEPT tree and encrypts this SEPT page with the TD's guest keyID. The kernel cannot use the SEPT page until after reclaiming it via TDH.MEM.SEPT.REMOVE or TDH.PHYMEM.PAGE.RECLAIM. Before passing the page to the TDX module, tdh_mem_sept_add() performs a CLFLUSH on the page mapped with keyID 0 to ensure that any dirty cache lines don't write back later and clobber TD memory or control structures. Don't worry about the other MK-TME keyIDs because the kernel doesn't use them. The TDX docs specify that this flush is not needed unless the TDX module exposes the CLFLUSH_BEFORE_ALLOC feature bit. Do the CLFLUSH unconditionally for two reasons: make the solution simpler by having a single path that can handle both !CLFLUSH_BEFORE_ALLOC and CLFLUSH_BEFORE_ALLOC cases. Avoid wading into any correctness uncertainty by going with a conservative solution to start. Callers should specify "GPA" and "level" for the TDX module to install the SEPT page at the specified position in the SEPT. Do not include the root page level in "level" since TDH.MEM.SEPT.ADD can only add non-root pages to the SEPT. Ensure "level" is between 1 and 3 for a 4-level SEPT or between 1 and 4 for a 5-level SEPT. Call tdh_mem_sept_add() during the TD's build time or during the TD's runtime. Check for errors from the function return value and retrieve extended error info from the function output parameters. The TDX module has many internal locks. To avoid staying in SEAM mode for too long, SEAMCALLs returns a BUSY error code to the kernel instead of spinning on the locks. Depending on the specific SEAMCALL, the caller may need to handle this error in specific ways (e.g., retry). Therefore, return the SEAMCALL error code directly to the caller. Don't attempt to handle it in the core kernel. TDH.MEM.SEPT.ADD effectively manages two internal resources of the TDX module: it installs page table pages in the SEPT tree and also updates the TDX module's page metadata (PAMT). Don't add a wrapper for the matching SEAMCALL for removing a SEPT page (TDH.MEM.SEPT.REMOVE) because KVM, as the only in-kernel user, will only tear down the SEPT tree when the TD is being torn down. When this happens it can just do other operations that reclaim the SEPT pages for the host kernels to use, update the PAMT and let the SEPT get trashed. [Kai: Switched from generic seamcall export] [Yan: Re-wrote the changelog] Co-developed-by: Sean Christopherson Signed-off-by: Sean Christopherson Signed-off-by: Isaku Yamahata Signed-off-by: Kai Huang Signed-off-by: Rick Edgecombe Signed-off-by: Yan Zhao Message-ID: <20241112073624.22114-1-yan.y.zhao@intel.com> Acked-by: Dave Hansen Signed-off-by: Paolo Bonzini --- arch/x86/include/asm/tdx.h | 7 ++++++- arch/x86/virt/vmx/tdx/tdx.c | 19 +++++++++++++++++++ arch/x86/virt/vmx/tdx/tdx.h | 1 + 3 files changed, 26 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h index 7dd71ca3eb57..5c9bdf68002c 100644 --- a/arch/x86/include/asm/tdx.h +++ b/arch/x86/include/asm/tdx.h @@ -147,7 +147,6 @@ struct tdx_vp { struct page **tdcx_pages; }; - static inline u64 mk_keyed_paddr(u16 hkid, struct page *page) { u64 ret; @@ -157,10 +156,16 @@ static inline u64 mk_keyed_paddr(u16 hkid, struct page *page) ret |= (u64)hkid << boot_cpu_data.x86_phys_bits; return ret; +} +static inline int pg_level_to_tdx_sept_level(enum pg_level level) +{ + WARN_ON_ONCE(level == PG_LEVEL_NONE); + return level - 1; } u64 tdh_mng_addcx(struct tdx_td *td, struct page *tdcs_page); +u64 tdh_mem_sept_add(struct tdx_td *td, u64 gpa, int level, struct page *page, u64 *ext_err1, u64 *ext_err2); u64 tdh_vp_addcx(struct tdx_vp *vp, struct page *tdcx_page); u64 tdh_mng_key_config(struct tdx_td *td); u64 tdh_mng_create(struct tdx_td *td, u16 hkid); diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c index 3a272e9ff2ca..506a75fbac0b 100644 --- a/arch/x86/virt/vmx/tdx/tdx.c +++ b/arch/x86/virt/vmx/tdx/tdx.c @@ -1529,6 +1529,25 @@ u64 tdh_mng_addcx(struct tdx_td *td, struct page *tdcs_page) } EXPORT_SYMBOL_GPL(tdh_mng_addcx); +u64 tdh_mem_sept_add(struct tdx_td *td, u64 gpa, int level, struct page *page, u64 *ext_err1, u64 *ext_err2) +{ + struct tdx_module_args args = { + .rcx = gpa | level, + .rdx = tdx_tdr_pa(td), + .r8 = page_to_phys(page), + }; + u64 ret; + + tdx_clflush_page(page); + ret = seamcall_ret(TDH_MEM_SEPT_ADD, &args); + + *ext_err1 = args.rcx; + *ext_err2 = args.rdx; + + return ret; +} +EXPORT_SYMBOL_GPL(tdh_mem_sept_add); + u64 tdh_vp_addcx(struct tdx_vp *vp, struct page *tdcx_page) { struct tdx_module_args args = { diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h index da384387d4eb..f3e37df4c63a 100644 --- a/arch/x86/virt/vmx/tdx/tdx.h +++ b/arch/x86/virt/vmx/tdx/tdx.h @@ -15,6 +15,7 @@ * TDX module SEAMCALL leaf functions */ #define TDH_MNG_ADDCX 1 +#define TDH_MEM_SEPT_ADD 3 #define TDH_VP_ADDCX 4 #define TDH_MNG_KEY_CONFIG 8 #define TDH_MNG_CREATE 9 From patchwork Wed Feb 26 19:55:02 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Bonzini X-Patchwork-Id: 13993090 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 75AE725A320 for ; Wed, 26 Feb 2025 19:55:43 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740599745; cv=none; b=XeZuM6WpTibkXOa0JyoPxf9r41G1QyF8Tgyl2cTVed4T5/LyCohLVM1rDgJEXkGLJ895UUBd0XEDN9zfHWIVOI9UBgDsSa2+yClPOp59/UeSOUsNBzjE6aRMmuby827pqMcmo1b0ZgbexCkOg9L7j8qyOyoXrCNjvPWRh/6O09g= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740599745; c=relaxed/simple; bh=IbNbJszvUGVVZtYAdISc6Z9mXYMEZwAPM9i5/O82g2Y=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=b5qyYJc96xCzrTimvc2bZOJMYUyHp52kLACtbaMaO9EW6AEGHztCmKUpihSMHg1QNKOJaM4aW6SNyT/VrSPlw9gB7UlyaZPMxfmNvrIAcjuezWP7FuFNl6PzSePdhWcOu26E+tdQtAUHjgvdnnVHX3bh6fNCdaapkZcQYVeJv/s= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=VbW7gwp7; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="VbW7gwp7" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1740599742; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=TB1VFAvS7oUVC+j7J7lR5YPE1WZnSPUBC9SnonylGF4=; b=VbW7gwp7KJDFbnvq0dE7Gjdd8VWala2lJAH10tc1c0jZMqgV2VlxFzpsZVytoxnBDQ5kVp gfyR8W7V8Wsy2Zh9jLE5XN6YGC7DbY8VBSt2r6qbgt0ARdJ+OZ2BCzNKN+1p2l4KqxIAfY w6EXbMpLATShjsiwEelZ9PNAEqvR6qU= Received: from mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-94-xp5hir62NXKGbVx6Xe1Zdg-1; Wed, 26 Feb 2025 14:55:36 -0500 X-MC-Unique: xp5hir62NXKGbVx6Xe1Zdg-1 X-Mimecast-MFC-AGG-ID: xp5hir62NXKGbVx6Xe1Zdg_1740599734 Received: from mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.17]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 902B21800873; Wed, 26 Feb 2025 19:55:34 +0000 (UTC) Received: from virtlab1023.lab.eng.rdu2.redhat.com (virtlab1023.lab.eng.rdu2.redhat.com [10.8.1.187]) by mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 2EC981944D02; Wed, 26 Feb 2025 19:55:33 +0000 (UTC) From: Paolo Bonzini To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: seanjc@google.com, Yan Zhao , Rick Edgecombe , Isaku Yamahata , Sean Christopherson , Kai Huang , Dave Hansen Subject: [PATCH 02/29] x86/virt/tdx: Add SEAMCALL wrappers to add TD private pages Date: Wed, 26 Feb 2025 14:55:02 -0500 Message-ID: <20250226195529.2314580-3-pbonzini@redhat.com> In-Reply-To: <20250226195529.2314580-1-pbonzini@redhat.com> References: <20250226195529.2314580-1-pbonzini@redhat.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.0 on 10.30.177.17 From: Isaku Yamahata TDX architecture introduces the concept of private GPA vs shared GPA, depending on the GPA.SHARED bit. The TDX module maintains a Secure EPT (S-EPT or SEPT) tree per TD to translate TD's private memory accessed using a private GPA. Wrap the SEAMCALL TDH.MEM.PAGE.ADD with tdh_mem_page_add() and TDH.MEM.PAGE.AUG with tdh_mem_page_aug() to add TD private pages and map them to the TD's private GPAs in the SEPT. Callers of tdh_mem_page_add() and tdh_mem_page_aug() allocate and provide normal pages to the wrappers, who further pass those pages to the TDX module. Before passing the pages to the TDX module, tdh_mem_page_add() and tdh_mem_page_aug() perform a CLFLUSH on the page mapped with keyID 0 to ensure that any dirty cache lines don't write back later and clobber TD memory or control structures. Don't worry about the other MK-TME keyIDs because the kernel doesn't use them. The TDX docs specify that this flush is not needed unless the TDX module exposes the CLFLUSH_BEFORE_ALLOC feature bit. Do the CLFLUSH unconditionally for two reasons: make the solution simpler by having a single path that can handle both !CLFLUSH_BEFORE_ALLOC and CLFLUSH_BEFORE_ALLOC cases. Avoid wading into any correctness uncertainty by going with a conservative solution to start. Call tdh_mem_page_add() to add a private page to a TD during the TD's build time (i.e., before TDH.MR.FINALIZE). Specify which GPA the 4K private page will map to. No need to specify level info since TDH.MEM.PAGE.ADD only adds pages at 4K level. To provide initial contents to TD, provide an additional source page residing in memory managed by the host kernel itself (encrypted with a shared keyID). The TDX module will copy the initial contents from the source page in shared memory into the private page after mapping the page in the SEPT to the specified private GPA. The TDX module allows the source page to be the same page as the private page to be added. In that case, the TDX module converts and encrypts the source page as a TD private page. Call tdh_mem_page_aug() to add a private page to a TD during the TD's runtime (i.e., after TDH.MR.FINALIZE). TDH.MEM.PAGE.AUG supports adding huge pages. Specify which GPA the private page will map to, along with level info embedded in the lower bits of the GPA. The TDX module will recognize the added page as the TD's private page after the TD's acceptance with TDCALL TDG.MEM.PAGE.ACCEPT. tdh_mem_page_add() and tdh_mem_page_aug() may fail. Callers can check function return value and retrieve extended error info from the function output parameters. The TDX module has many internal locks. To avoid staying in SEAM mode for too long, SEAMCALLs returns a BUSY error code to the kernel instead of spinning on the locks. Depending on the specific SEAMCALL, the caller may need to handle this error in specific ways (e.g., retry). Therefore, return the SEAMCALL error code directly to the caller. Don't attempt to handle it in the core kernel. [Kai: Switched from generic seamcall export] [Yan: Re-wrote the changelog] Co-developed-by: Sean Christopherson Signed-off-by: Sean Christopherson Signed-off-by: Isaku Yamahata Signed-off-by: Kai Huang Signed-off-by: Rick Edgecombe Signed-off-by: Yan Zhao Message-ID: <20241112073636.22129-1-yan.y.zhao@intel.com> Acked-by: Dave Hansen Signed-off-by: Paolo Bonzini --- arch/x86/include/asm/tdx.h | 2 ++ arch/x86/virt/vmx/tdx/tdx.c | 39 +++++++++++++++++++++++++++++++++++++ arch/x86/virt/vmx/tdx/tdx.h | 2 ++ 3 files changed, 43 insertions(+) diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h index 5c9bdf68002c..9f978db2560f 100644 --- a/arch/x86/include/asm/tdx.h +++ b/arch/x86/include/asm/tdx.h @@ -165,8 +165,10 @@ static inline int pg_level_to_tdx_sept_level(enum pg_level level) } u64 tdh_mng_addcx(struct tdx_td *td, struct page *tdcs_page); +u64 tdh_mem_page_add(struct tdx_td *td, u64 gpa, struct page *page, struct page *source, u64 *ext_err1, u64 *ext_err2); u64 tdh_mem_sept_add(struct tdx_td *td, u64 gpa, int level, struct page *page, u64 *ext_err1, u64 *ext_err2); u64 tdh_vp_addcx(struct tdx_vp *vp, struct page *tdcx_page); +u64 tdh_mem_page_aug(struct tdx_td *td, u64 gpa, int level, struct page *page, u64 *ext_err1, u64 *ext_err2); u64 tdh_mng_key_config(struct tdx_td *td); u64 tdh_mng_create(struct tdx_td *td, u16 hkid); u64 tdh_vp_create(struct tdx_td *td, struct tdx_vp *vp); diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c index 506a75fbac0b..4ae10246260e 100644 --- a/arch/x86/virt/vmx/tdx/tdx.c +++ b/arch/x86/virt/vmx/tdx/tdx.c @@ -1529,6 +1529,26 @@ u64 tdh_mng_addcx(struct tdx_td *td, struct page *tdcs_page) } EXPORT_SYMBOL_GPL(tdh_mng_addcx); +u64 tdh_mem_page_add(struct tdx_td *td, u64 gpa, struct page *page, struct page *source, u64 *ext_err1, u64 *ext_err2) +{ + struct tdx_module_args args = { + .rcx = gpa, + .rdx = tdx_tdr_pa(td), + .r8 = page_to_phys(page), + .r9 = page_to_phys(source), + }; + u64 ret; + + tdx_clflush_page(page); + ret = seamcall_ret(TDH_MEM_PAGE_ADD, &args); + + *ext_err1 = args.rcx; + *ext_err2 = args.rdx; + + return ret; +} +EXPORT_SYMBOL_GPL(tdh_mem_page_add); + u64 tdh_mem_sept_add(struct tdx_td *td, u64 gpa, int level, struct page *page, u64 *ext_err1, u64 *ext_err2) { struct tdx_module_args args = { @@ -1560,6 +1580,25 @@ u64 tdh_vp_addcx(struct tdx_vp *vp, struct page *tdcx_page) } EXPORT_SYMBOL_GPL(tdh_vp_addcx); +u64 tdh_mem_page_aug(struct tdx_td *td, u64 gpa, int level, struct page *page, u64 *ext_err1, u64 *ext_err2) +{ + struct tdx_module_args args = { + .rcx = gpa | level, + .rdx = tdx_tdr_pa(td), + .r8 = page_to_phys(page), + }; + u64 ret; + + tdx_clflush_page(page); + ret = seamcall_ret(TDH_MEM_PAGE_AUG, &args); + + *ext_err1 = args.rcx; + *ext_err2 = args.rdx; + + return ret; +} +EXPORT_SYMBOL_GPL(tdh_mem_page_aug); + u64 tdh_mng_key_config(struct tdx_td *td) { struct tdx_module_args args = { diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h index f3e37df4c63a..5879bdb0045f 100644 --- a/arch/x86/virt/vmx/tdx/tdx.h +++ b/arch/x86/virt/vmx/tdx/tdx.h @@ -15,8 +15,10 @@ * TDX module SEAMCALL leaf functions */ #define TDH_MNG_ADDCX 1 +#define TDH_MEM_PAGE_ADD 2 #define TDH_MEM_SEPT_ADD 3 #define TDH_VP_ADDCX 4 +#define TDH_MEM_PAGE_AUG 6 #define TDH_MNG_KEY_CONFIG 8 #define TDH_MNG_CREATE 9 #define TDH_MNG_RD 11 From patchwork Wed Feb 26 19:55:03 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Bonzini X-Patchwork-Id: 13993103 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6783325E464 for ; Wed, 26 Feb 2025 19:55:48 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740599752; cv=none; b=qaOcPN9Fv//c8Xyw7GNwTBInQzXSeTVt95pO0fOL8KvscfqCXxFWaYQOyIb4rn+pAGQ9HinOjB+DQBuCvDZn/bKnBMtAfP9M9d78cBV5mN1/nW1lAxjagJ4+FdMMnOcEyW3RKEmaHG50iNI6bNiVsQM12dII2x4Vnky96oFi1pk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740599752; c=relaxed/simple; bh=SgOv0fw3vHtFfJ2YGZZ6UgQ2qqF9BvchJF3sQC8YzpU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=thtibgxcf2rzWlZu3gdWG0MfrHvW7unOYBNzqhqNWrLiLZmNEmPMRM7dW8JmHnYUM0l/JNK0wmzxmZpmK6rvfQ1ZN3g6RMoHRE2LpvAfmSTOP9fLAeMtT4kG2xSFtZUCN8mLBKPgELNS1+x2aS/BBBjN4wBow9cn05b8qoLO9pg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=DqlqGblx; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="DqlqGblx" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1740599746; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=JHjkXoSefLFcMkYSnNMyUsdXOnSEL8i7jCTLIkwclzg=; b=DqlqGblxdStmz+mNBjT2bMOtDkVfuMEfFzcXjDGd+Zd4lUSgXWY1ikxbT3DfPonp8YhTIA CERCP0PeOL3eKT9ZSH0MkOE2Z4m8wRfLD+lSXzQtshP65trnuj3ByJFTToaXCmlfC8X4zQ XmHF68CU/RSSIFeN2TssZBnuDJLwCug= Received: from mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-203-k_iTaRYMNJyuPloGVeYN8Q-1; Wed, 26 Feb 2025 14:55:38 -0500 X-MC-Unique: k_iTaRYMNJyuPloGVeYN8Q-1 X-Mimecast-MFC-AGG-ID: k_iTaRYMNJyuPloGVeYN8Q_1740599736 Received: from mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.17]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 142971800878; Wed, 26 Feb 2025 19:55:36 +0000 (UTC) Received: from virtlab1023.lab.eng.rdu2.redhat.com (virtlab1023.lab.eng.rdu2.redhat.com [10.8.1.187]) by mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id C719D1944D02; Wed, 26 Feb 2025 19:55:34 +0000 (UTC) From: Paolo Bonzini To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: seanjc@google.com, Yan Zhao , Rick Edgecombe , Isaku Yamahata , Sean Christopherson , Kai Huang , Dave Hansen Subject: [PATCH 03/29] x86/virt/tdx: Add SEAMCALL wrappers to manage TDX TLB tracking Date: Wed, 26 Feb 2025 14:55:03 -0500 Message-ID: <20250226195529.2314580-4-pbonzini@redhat.com> In-Reply-To: <20250226195529.2314580-1-pbonzini@redhat.com> References: <20250226195529.2314580-1-pbonzini@redhat.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.0 on 10.30.177.17 From: Isaku Yamahata TDX module defines a TLB tracking protocol to make sure that no logical processor holds any stale Secure EPT (S-EPT or SEPT) TLB translations for a given TD private GPA range. After a successful TDH.MEM.RANGE.BLOCK, TDH.MEM.TRACK, and kicking off all vCPUs, TDX module ensures that the subsequent TDH.VP.ENTER on each vCPU will flush all stale TLB entries for the specified GPA ranges in TDH.MEM.RANGE.BLOCK. Wrap the TDH.MEM.RANGE.BLOCK with tdh_mem_range_block() and TDH.MEM.TRACK with tdh_mem_track() to enable the kernel to assist the TDX module in TLB tracking management. The caller of tdh_mem_range_block() needs to specify "GPA" and "level" to request the TDX module to block the subsequent creation of TLB translation for a GPA range. This GPA range can correspond to a SEPT page or a TD private page at any level. Contentions and errors are possible with the SEAMCALL TDH.MEM.RANGE.BLOCK. Therefore, the caller of tdh_mem_range_block() needs to check the function return value and retrieve extended error info from the function output params. Upon TDH.MEM.RANGE.BLOCK success, no new TLB entries will be created for the specified private GPA range, though the existing TLB translations may still persist. TDH.MEM.TRACK will then advance the TD's epoch counter to ensure TDX module will flush TLBs in all vCPUs once the vCPUs re-enter the TD. TDH.MEM.TRACK will fail to advance TD's epoch counter if there are vCPUs still running in non-root mode at the previous TD epoch counter. So to ensure private GPA translations are flushed, callers must first call tdh_mem_range_block(), then tdh_mem_track(), and lastly send IPIs to kick all the vCPUs and force them to re-enter, thus triggering the TLB flush. Don't export a single operation and instead export functions that just expose the block and track operations; this is for a couple reasons: 1. The vCPU kick should use KVM's functionality for doing this, which can better target sending IPIs to only the minimum required pCPUs. 2. tdh_mem_track() doesn't need to be executed if a vCPU has not entered a TD, which is information only KVM knows. 3. Leaving the operations separate will allow for batching many tdh_mem_range_block() calls before a tdh_mem_track(). While this batching will not be done initially by KVM, it demonstrates that keeping mem block and track as separate operations is a generally good design. Contentions are also possible in TDH.MEM.TRACK. For example, TDH.MEM.TRACK may contend with TDH.VP.ENTER when advancing the TD epoch counter. tdh_mem_track() does not provide the retries for the caller. Callers can choose to avoid contentions or retry on their own. [Kai: Switched from generic seamcall export] [Yan: Re-wrote the changelog] Co-developed-by: Sean Christopherson Signed-off-by: Sean Christopherson Signed-off-by: Isaku Yamahata Signed-off-by: Kai Huang Signed-off-by: Rick Edgecombe Signed-off-by: Yan Zhao Message-ID: <20241112073648.22143-1-yan.y.zhao@intel.com> Acked-by: Dave Hansen Signed-off-by: Paolo Bonzini --- arch/x86/include/asm/tdx.h | 2 ++ arch/x86/virt/vmx/tdx/tdx.c | 27 +++++++++++++++++++++++++++ arch/x86/virt/vmx/tdx/tdx.h | 2 ++ 3 files changed, 31 insertions(+) diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h index 9f978db2560f..cc050cc1367a 100644 --- a/arch/x86/include/asm/tdx.h +++ b/arch/x86/include/asm/tdx.h @@ -169,6 +169,7 @@ u64 tdh_mem_page_add(struct tdx_td *td, u64 gpa, struct page *page, struct page u64 tdh_mem_sept_add(struct tdx_td *td, u64 gpa, int level, struct page *page, u64 *ext_err1, u64 *ext_err2); u64 tdh_vp_addcx(struct tdx_vp *vp, struct page *tdcx_page); u64 tdh_mem_page_aug(struct tdx_td *td, u64 gpa, int level, struct page *page, u64 *ext_err1, u64 *ext_err2); +u64 tdh_mem_range_block(struct tdx_td *td, u64 gpa, int level, u64 *ext_err1, u64 *ext_err2); u64 tdh_mng_key_config(struct tdx_td *td); u64 tdh_mng_create(struct tdx_td *td, u16 hkid); u64 tdh_vp_create(struct tdx_td *td, struct tdx_vp *vp); @@ -181,6 +182,7 @@ u64 tdh_vp_init(struct tdx_vp *vp, u64 initial_rcx, u32 x2apicid); u64 tdh_vp_rd(struct tdx_vp *vp, u64 field, u64 *data); u64 tdh_vp_wr(struct tdx_vp *vp, u64 field, u64 data, u64 mask); u64 tdh_phymem_page_reclaim(struct page *page, u64 *tdx_pt, u64 *tdx_owner, u64 *tdx_size); +u64 tdh_mem_track(struct tdx_td *tdr); u64 tdh_phymem_cache_wb(bool resume); u64 tdh_phymem_page_wbinvd_tdr(struct tdx_td *td); #else diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c index 4ae10246260e..f920754bb35e 100644 --- a/arch/x86/virt/vmx/tdx/tdx.c +++ b/arch/x86/virt/vmx/tdx/tdx.c @@ -1599,6 +1599,23 @@ u64 tdh_mem_page_aug(struct tdx_td *td, u64 gpa, int level, struct page *page, u } EXPORT_SYMBOL_GPL(tdh_mem_page_aug); +u64 tdh_mem_range_block(struct tdx_td *td, u64 gpa, int level, u64 *ext_err1, u64 *ext_err2) +{ + struct tdx_module_args args = { + .rcx = gpa | level, + .rdx = tdx_tdr_pa(td), + }; + u64 ret; + + ret = seamcall_ret(TDH_MEM_RANGE_BLOCK, &args); + + *ext_err1 = args.rcx; + *ext_err2 = args.rdx; + + return ret; +} +EXPORT_SYMBOL_GPL(tdh_mem_range_block); + u64 tdh_mng_key_config(struct tdx_td *td) { struct tdx_module_args args = { @@ -1761,6 +1778,16 @@ u64 tdh_phymem_page_reclaim(struct page *page, u64 *tdx_pt, u64 *tdx_owner, u64 } EXPORT_SYMBOL_GPL(tdh_phymem_page_reclaim); +u64 tdh_mem_track(struct tdx_td *td) +{ + struct tdx_module_args args = { + .rcx = tdx_tdr_pa(td), + }; + + return seamcall(TDH_MEM_TRACK, &args); +} +EXPORT_SYMBOL_GPL(tdh_mem_track); + u64 tdh_phymem_cache_wb(bool resume) { struct tdx_module_args args = { diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h index 5879bdb0045f..104c97abf264 100644 --- a/arch/x86/virt/vmx/tdx/tdx.h +++ b/arch/x86/virt/vmx/tdx/tdx.h @@ -19,6 +19,7 @@ #define TDH_MEM_SEPT_ADD 3 #define TDH_VP_ADDCX 4 #define TDH_MEM_PAGE_AUG 6 +#define TDH_MEM_RANGE_BLOCK 7 #define TDH_MNG_KEY_CONFIG 8 #define TDH_MNG_CREATE 9 #define TDH_MNG_RD 11 @@ -36,6 +37,7 @@ #define TDH_SYS_RD 34 #define TDH_SYS_LP_INIT 35 #define TDH_SYS_TDMR_INIT 36 +#define TDH_MEM_TRACK 38 #define TDH_PHYMEM_CACHE_WB 40 #define TDH_PHYMEM_PAGE_WBINVD 41 #define TDH_VP_WR 43 From patchwork Wed Feb 26 19:55:04 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Bonzini X-Patchwork-Id: 13993091 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8A82C2566EB for ; Wed, 26 Feb 2025 19:55:44 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740599746; cv=none; b=NJQ+3FxOa/tHRisRztJcv0WeSRuKD+egKoJRP/7MDGFYean5cte4HFr5WIqHkWRZOVKZiNlejVFxs07GMzDfw8UkA1e3hbZYTKg8AYfu5penSJ7GSO3AQ0jtUmsdoAOTX1c0z5yOyxKG7kjm8X1f9DBPGlcxCkY2xFF90SdjosA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740599746; c=relaxed/simple; bh=u/sdQab8lXWKkMan8BUKA5i227KDtYwYUiBcjZtQ8uw=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=pY2ePcG5T2AIzGCVYvtgfrjilIemaBr5nKKU3UInC010AJd2CT2fZ0hsCaujtw98tm2JL+e2gCN3uJVu2h/coGwFA82slC89j956dxaWwPU6eN0Ftj+f1irVCTq8aBah84xqte7ktuS4GsgeUhNPbFEAKDgAZXxcTuaRPlmUHjU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=ijpD4lXD; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="ijpD4lXD" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1740599743; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=JNiS9NTbnI64NJwk0LUGNPvpim6dJ6JJAvupdTT7DPg=; b=ijpD4lXDDO0mE9SFaX2W4E0jWXgxefPKas3JtUup2Oh1A+pze54UHqRit1xfoBAxmRGKfJ /Xcuk+dKInZht0v8f6rcCpUiYBy14+dx5iPF6EPQpRf9mmBdyw8zQWbfxRRCUAm78wobfk 0DbpEOm+kwhR3tJJjGSq5uk1oBeJCGM= Received: from mx-prod-mc-04.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-81-JvjRQYaaNQaO1Q1icRShmQ-1; Wed, 26 Feb 2025 14:55:39 -0500 X-MC-Unique: JvjRQYaaNQaO1Q1icRShmQ-1 X-Mimecast-MFC-AGG-ID: JvjRQYaaNQaO1Q1icRShmQ_1740599737 Received: from mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.17]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-04.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 8A1B21903089; Wed, 26 Feb 2025 19:55:37 +0000 (UTC) Received: from virtlab1023.lab.eng.rdu2.redhat.com (virtlab1023.lab.eng.rdu2.redhat.com [10.8.1.187]) by mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 3A579196ADAB; Wed, 26 Feb 2025 19:55:36 +0000 (UTC) From: Paolo Bonzini To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: seanjc@google.com, Yan Zhao , Rick Edgecombe , Isaku Yamahata , Sean Christopherson , Kai Huang , Dave Hansen Subject: [PATCH 04/29] x86/virt/tdx: Add SEAMCALL wrappers to remove a TD private page Date: Wed, 26 Feb 2025 14:55:04 -0500 Message-ID: <20250226195529.2314580-5-pbonzini@redhat.com> In-Reply-To: <20250226195529.2314580-1-pbonzini@redhat.com> References: <20250226195529.2314580-1-pbonzini@redhat.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.0 on 10.30.177.17 From: Isaku Yamahata TDX architecture introduces the concept of private GPA vs shared GPA, depending on the GPA.SHARED bit. The TDX module maintains a single Secure EPT (S-EPT or SEPT) tree per TD to translate TD's private memory accessed using a private GPA. Wrap the SEAMCALL TDH.MEM.PAGE.REMOVE with tdh_mem_page_remove() and TDH_PHYMEM_PAGE_WBINVD with tdh_phymem_page_wbinvd_hkid() to unmap a TD private page from the SEPT, remove the TD private page from the TDX module and flush cache lines to memory after removal of the private page. Callers should specify "GPA" and "level" when calling tdh_mem_page_remove() to indicate to the TDX module which TD private page to unmap and remove. TDH.MEM.PAGE.REMOVE may fail, and the caller of tdh_mem_page_remove() can check the function return value and retrieve extended error information from the function output parameters. Follow the TLB tracking protocol before calling tdh_mem_page_remove() to remove a TD private page to avoid SEAMCALL failure. After removing a TD's private page, the TDX module does not write back and invalidate cache lines associated with the page and the page's keyID (i.e., the TD's guest keyID). Therefore, provide tdh_phymem_page_wbinvd_hkid() to allow the caller to pass in the TD's guest keyID and invoke TDH_PHYMEM_PAGE_WBINVD to perform this action. Before reusing the page, the host kernel needs to map the page with keyID 0 and invoke movdir64b() to convert the TD private page to a normal shared page. TDH.MEM.PAGE.REMOVE and TDH_PHYMEM_PAGE_WBINVD may meet contentions inside the TDX module for TDX's internal resources. To avoid staying in SEAM mode for too long, TDX module will return a BUSY error code to the kernel instead of spinning on the locks. The caller may need to handle this error in specific ways (e.g., retry). The wrappers return the SEAMCALL error code directly to the caller. Don't attempt to handle it in the core kernel. [Kai: Switched from generic seamcall export] [Yan: Re-wrote the changelog] Co-developed-by: Sean Christopherson Signed-off-by: Sean Christopherson Signed-off-by: Isaku Yamahata Signed-off-by: Kai Huang Signed-off-by: Rick Edgecombe Signed-off-by: Yan Zhao Message-ID: <20241112073658.22157-1-yan.y.zhao@intel.com> Acked-by: Dave Hansen Signed-off-by: Paolo Bonzini --- arch/x86/include/asm/tdx.h | 2 ++ arch/x86/virt/vmx/tdx/tdx.c | 27 +++++++++++++++++++++++++++ arch/x86/virt/vmx/tdx/tdx.h | 1 + 3 files changed, 30 insertions(+) diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h index cc050cc1367a..28d25d1bfa96 100644 --- a/arch/x86/include/asm/tdx.h +++ b/arch/x86/include/asm/tdx.h @@ -183,8 +183,10 @@ u64 tdh_vp_rd(struct tdx_vp *vp, u64 field, u64 *data); u64 tdh_vp_wr(struct tdx_vp *vp, u64 field, u64 data, u64 mask); u64 tdh_phymem_page_reclaim(struct page *page, u64 *tdx_pt, u64 *tdx_owner, u64 *tdx_size); u64 tdh_mem_track(struct tdx_td *tdr); +u64 tdh_mem_page_remove(struct tdx_td *td, u64 gpa, u64 level, u64 *ext_err1, u64 *ext_err2); u64 tdh_phymem_cache_wb(bool resume); u64 tdh_phymem_page_wbinvd_tdr(struct tdx_td *td); +u64 tdh_phymem_page_wbinvd_hkid(u64 hkid, struct page *page); #else static inline void tdx_init(void) { } static inline int tdx_cpu_enable(void) { return -ENODEV; } diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c index f920754bb35e..ffa0f6b8254d 100644 --- a/arch/x86/virt/vmx/tdx/tdx.c +++ b/arch/x86/virt/vmx/tdx/tdx.c @@ -1788,6 +1788,23 @@ u64 tdh_mem_track(struct tdx_td *td) } EXPORT_SYMBOL_GPL(tdh_mem_track); +u64 tdh_mem_page_remove(struct tdx_td *td, u64 gpa, u64 level, u64 *ext_err1, u64 *ext_err2) +{ + struct tdx_module_args args = { + .rcx = gpa | level, + .rdx = tdx_tdr_pa(td), + }; + u64 ret; + + ret = seamcall_ret(TDH_MEM_PAGE_REMOVE, &args); + + *ext_err1 = args.rcx; + *ext_err2 = args.rdx; + + return ret; +} +EXPORT_SYMBOL_GPL(tdh_mem_page_remove); + u64 tdh_phymem_cache_wb(bool resume) { struct tdx_module_args args = { @@ -1807,3 +1824,13 @@ u64 tdh_phymem_page_wbinvd_tdr(struct tdx_td *td) return seamcall(TDH_PHYMEM_PAGE_WBINVD, &args); } EXPORT_SYMBOL_GPL(tdh_phymem_page_wbinvd_tdr); + +u64 tdh_phymem_page_wbinvd_hkid(u64 hkid, struct page *page) +{ + struct tdx_module_args args = {}; + + args.rcx = mk_keyed_paddr(hkid, page); + + return seamcall(TDH_PHYMEM_PAGE_WBINVD, &args); +} +EXPORT_SYMBOL_GPL(tdh_phymem_page_wbinvd_hkid); diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h index 104c97abf264..c9bea023f2b4 100644 --- a/arch/x86/virt/vmx/tdx/tdx.h +++ b/arch/x86/virt/vmx/tdx/tdx.h @@ -32,6 +32,7 @@ #define TDH_PHYMEM_PAGE_RDMD 24 #define TDH_VP_RD 26 #define TDH_PHYMEM_PAGE_RECLAIM 28 +#define TDH_MEM_PAGE_REMOVE 29 #define TDH_SYS_KEY_CONFIG 31 #define TDH_SYS_INIT 33 #define TDH_SYS_RD 34 From patchwork Wed Feb 26 19:55:05 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Bonzini X-Patchwork-Id: 13993093 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4E03925C6E2 for ; Wed, 26 Feb 2025 19:55:46 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740599748; cv=none; b=Zkh4DYv2UDDtlpqYcI5ze48/4xXxZhBNlEVLF4utmOCtMZjJ3Tugppgaz8Bu3bzacTKVHG/dplcbLYyd2Cq7FtJ3c3DL4pwOTTc8CDP4HXb0+Y09WgQFygfHEnAWJKRZRHF+fUeuoLXE+f22y0X6F+KOhoM3HvwX/Ci+Sb4RmDY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740599748; c=relaxed/simple; bh=tuCDezGkGZKes6SNps7SpkW0sxeyUG6wKmTBxPiKhQ4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=PuFCA0lixPisLvRSazTg2ht+ZF9cV3CGD/2t3eWwdfsj2sSTAGDDxWzBEs41IHRSMPNFDmqk4g40q4ktrnlBeXXarPX/E6igft7skatXqYw8G28kxa3rMpECO/PGBMdRUTOvz/ZhrsXTjSjFQFHM89MdBCgyq4BXoQRXChnZ+is= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=FZMOkb7K; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="FZMOkb7K" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1740599745; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=rupr9p/wgo3RuMUrJOLANvVCzCd0tb9n2/MMkqBhlGA=; b=FZMOkb7Ktg8EQMyIbc5Z5EwbLpY0H1z5LRboxB/kNGoYtzzFqD9d5xeoUN+Avzxn3BSFKr icQ4j5ZApNTCJKyRqFB/8xjzz90R9a/onlHN4/zTMu/cxdIlQDOk5vd5x+wfZGy3vPz9tn Mwvv8JsQrXtl8bg/xoSMHqWRZ6KWNpg= Received: from mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-382-AzmdJ06uPZSlS6n2cqWU1A-1; Wed, 26 Feb 2025 14:55:40 -0500 X-MC-Unique: AzmdJ06uPZSlS6n2cqWU1A-1 X-Mimecast-MFC-AGG-ID: AzmdJ06uPZSlS6n2cqWU1A_1740599739 Received: from mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.17]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 12F2119560B5; Wed, 26 Feb 2025 19:55:39 +0000 (UTC) Received: from virtlab1023.lab.eng.rdu2.redhat.com (virtlab1023.lab.eng.rdu2.redhat.com [10.8.1.187]) by mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id BD8D51944D02; Wed, 26 Feb 2025 19:55:37 +0000 (UTC) From: Paolo Bonzini To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: seanjc@google.com, Yan Zhao , Rick Edgecombe , Isaku Yamahata , Sean Christopherson , Kai Huang , Dave Hansen Subject: [PATCH 05/29] x86/virt/tdx: Add SEAMCALL wrappers for TD measurement of initial contents Date: Wed, 26 Feb 2025 14:55:05 -0500 Message-ID: <20250226195529.2314580-6-pbonzini@redhat.com> In-Reply-To: <20250226195529.2314580-1-pbonzini@redhat.com> References: <20250226195529.2314580-1-pbonzini@redhat.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.0 on 10.30.177.17 From: Isaku Yamahata The TDX module measures the TD during the build process and saves the measurement in TDCS.MRTD to facilitate TD attestation of the initial contents of the TD. Wrap the SEAMCALL TDH.MR.EXTEND with tdh_mr_extend() and TDH.MR.FINALIZE with tdh_mr_finalize() to enable the host kernel to assist the TDX module in performing the measurement. The measurement in TDCS.MRTD is a SHA-384 digest of the build process. SEAMCALLs TDH.MNG.INIT and TDH.MEM.PAGE.ADD initialize and contribute to the MRTD digest calculation. The caller of tdh_mr_extend() should break the TD private page into chunks of size TDX_EXTENDMR_CHUNKSIZE and invoke tdh_mr_extend() to add the page content into the digest calculation. Failures are possible with TDH.MR.EXTEND (e.g., due to SEPT walking). The caller of tdh_mr_extend() can check the function return value and retrieve extended error information from the function output parameters. Calling tdh_mr_finalize() completes the measurement. The TDX module then turns the TD into the runnable state. Further TDH.MEM.PAGE.ADD and TDH.MR.EXTEND calls will fail. TDH.MR.FINALIZE may fail due to errors such as the TD having no vCPUs or contentions. Check function return value when calling tdh_mr_finalize() to determine the exact reason for failure. Take proper locks on the caller's side to avoid contention failures, or handle the BUSY error in specific ways (e.g., retry). Return the SEAMCALL error code directly to the caller. Do not attempt to handle it in the core kernel. [Kai: Switched from generic seamcall export] [Yan: Re-wrote the changelog] Co-developed-by: Sean Christopherson Signed-off-by: Sean Christopherson Signed-off-by: Isaku Yamahata Signed-off-by: Kai Huang Signed-off-by: Rick Edgecombe Signed-off-by: Yan Zhao Message-ID: <20241112073709.22171-1-yan.y.zhao@intel.com> Acked-by: Dave Hansen Signed-off-by: Paolo Bonzini --- arch/x86/include/asm/tdx.h | 2 ++ arch/x86/virt/vmx/tdx/tdx.c | 27 +++++++++++++++++++++++++++ arch/x86/virt/vmx/tdx/tdx.h | 2 ++ 3 files changed, 31 insertions(+) diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h index 28d25d1bfa96..c0841fc87fa8 100644 --- a/arch/x86/include/asm/tdx.h +++ b/arch/x86/include/asm/tdx.h @@ -174,6 +174,8 @@ u64 tdh_mng_key_config(struct tdx_td *td); u64 tdh_mng_create(struct tdx_td *td, u16 hkid); u64 tdh_vp_create(struct tdx_td *td, struct tdx_vp *vp); u64 tdh_mng_rd(struct tdx_td *td, u64 field, u64 *data); +u64 tdh_mr_extend(struct tdx_td *td, u64 gpa, u64 *ext_err1, u64 *ext_err2); +u64 tdh_mr_finalize(struct tdx_td *td); u64 tdh_vp_flush(struct tdx_vp *vp); u64 tdh_mng_vpflushdone(struct tdx_td *td); u64 tdh_mng_key_freeid(struct tdx_td *td); diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c index ffa0f6b8254d..8eaaf31a4187 100644 --- a/arch/x86/virt/vmx/tdx/tdx.c +++ b/arch/x86/virt/vmx/tdx/tdx.c @@ -1667,6 +1667,33 @@ u64 tdh_mng_rd(struct tdx_td *td, u64 field, u64 *data) } EXPORT_SYMBOL_GPL(tdh_mng_rd); +u64 tdh_mr_extend(struct tdx_td *td, u64 gpa, u64 *ext_err1, u64 *ext_err2) +{ + struct tdx_module_args args = { + .rcx = gpa, + .rdx = tdx_tdr_pa(td), + }; + u64 ret; + + ret = seamcall_ret(TDH_MR_EXTEND, &args); + + *ext_err1 = args.rcx; + *ext_err2 = args.rdx; + + return ret; +} +EXPORT_SYMBOL_GPL(tdh_mr_extend); + +u64 tdh_mr_finalize(struct tdx_td *td) +{ + struct tdx_module_args args = { + .rcx = tdx_tdr_pa(td), + }; + + return seamcall(TDH_MR_FINALIZE, &args); +} +EXPORT_SYMBOL_GPL(tdh_mr_finalize); + u64 tdh_vp_flush(struct tdx_vp *vp) { struct tdx_module_args args = { diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h index c9bea023f2b4..ed7152f81a6d 100644 --- a/arch/x86/virt/vmx/tdx/tdx.h +++ b/arch/x86/virt/vmx/tdx/tdx.h @@ -23,6 +23,8 @@ #define TDH_MNG_KEY_CONFIG 8 #define TDH_MNG_CREATE 9 #define TDH_MNG_RD 11 +#define TDH_MR_EXTEND 16 +#define TDH_MR_FINALIZE 17 #define TDH_VP_FLUSH 18 #define TDH_MNG_VPFLUSHDONE 19 #define TDH_VP_CREATE 10 From patchwork Wed Feb 26 19:55:06 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Bonzini X-Patchwork-Id: 13993101 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4EF4925E458 for ; Wed, 26 Feb 2025 19:55:47 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740599749; cv=none; b=aCVGwumxcenbpc3IwIk6PFWet6t/mqEsMCDLiLXaDOqsma943A1kZr496DQW+GzV8gAKQfamYkqetFt+wJJAMP/AG//P8DGAJj3cvo5bYXuRO3dvb6FbAGLY9BcE5ZusdOuWy0IDdMv1fRZSxNZV99fudQQ7VoZMXNIUkvGI838= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740599749; c=relaxed/simple; bh=YApeIwsQId9X/2wsGLl/eGkRiUx1BikJ1Isg+Wz2W+E=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=Yd5DKoR36XJS3SMkIU8Q/no3qRUPyzDXZ+zHx8UsADZ5Fxc951SBUD1p5gZTBwPl9m+kz+YmWdSnebFjNo4z8DkGJ73VfD7oX5r6iCOvZqAL+Y0dK7m8E3E63N+LIVNLgVJeLazT8AxAHSRdGH+BAxcQrZRhtnEvoQ6MeOES1i8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=e/wusIXd; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="e/wusIXd" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1740599747; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=x8HwOqxovlLqANJKv1rHzsM8ETvMMYIO18NESmV+lHw=; b=e/wusIXdER65fxke1Wltdn+d98tKw8f3MXDsoQbcgleie1PykEjWAvEOLy8tn0oleoP0jv 2MeUMPDND/Qp7B/1XrJqhILHQAPFC+LrbdLwi/Ubm7P4Yb2NFjApuEXX5ZGUZThWwHoe/3 U80UW6hKAqKLUWgEMfiBA85QS2RomM8= Received: from mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-613-evdx9S5qN-aWNzHcZO3nDA-1; Wed, 26 Feb 2025 14:55:41 -0500 X-MC-Unique: evdx9S5qN-aWNzHcZO3nDA-1 X-Mimecast-MFC-AGG-ID: evdx9S5qN-aWNzHcZO3nDA_1740599740 Received: from mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.17]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 43A391800872; Wed, 26 Feb 2025 19:55:40 +0000 (UTC) Received: from virtlab1023.lab.eng.rdu2.redhat.com (virtlab1023.lab.eng.rdu2.redhat.com [10.8.1.187]) by mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 44C551944D02; Wed, 26 Feb 2025 19:55:39 +0000 (UTC) From: Paolo Bonzini To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: seanjc@google.com, Yan Zhao , Rick Edgecombe Subject: [PATCH 06/29] KVM: x86/mmu: Implement memslot deletion for TDX Date: Wed, 26 Feb 2025 14:55:06 -0500 Message-ID: <20250226195529.2314580-7-pbonzini@redhat.com> In-Reply-To: <20250226195529.2314580-1-pbonzini@redhat.com> References: <20250226195529.2314580-1-pbonzini@redhat.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.0 on 10.30.177.17 From: Rick Edgecombe Update attr_filter field to zap both private and shared mappings for TDX when memslot is deleted. Signed-off-by: Rick Edgecombe Co-developed-by: Yan Zhao Signed-off-by: Yan Zhao Message-ID: <20241112073426.21997-1-yan.y.zhao@intel.com> Signed-off-by: Paolo Bonzini --- arch/x86/kvm/mmu/mmu.c | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index c1c54a7a7735..9d5d50e2464f 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -7072,6 +7072,7 @@ static void kvm_mmu_zap_memslot(struct kvm *kvm, .start = slot->base_gfn, .end = slot->base_gfn + slot->npages, .may_block = true, + .attr_filter = KVM_FILTER_PRIVATE | KVM_FILTER_SHARED, }; bool flush; From patchwork Wed Feb 26 19:55:07 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Bonzini X-Patchwork-Id: 13993092 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6218625A659 for ; Wed, 26 Feb 2025 19:55:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740599747; cv=none; b=Vg9QrSD7kGUwE3TA2Yhq+28orfShckypaWiW0aOgHE/o7LTNa7t2brkAY9jZ3YhGv8Q9HKvPGwEv3wtQ4eQfOiJrXoBW3xeScC1N3h2nMa8uLbAxiPBePWwdyjkYlZ/ugsxTPYwviw3UFbAhrS/3jQlIvUXTNQ5LxiENlRadeQk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740599747; c=relaxed/simple; bh=F/ElMsRyKHW98cCVtsU7nZs3X1Oye+965XrPrtj+S5c=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=cGDALYnGYBLhiXEHQesFErlTeRL43nL0B3UmLzZOURUjlQn4a3PRmoUpjlz17E1PC0VtH/F2jKZH5r4ezZLVTwZVt0pHs755hix5sCIb/VKQ1tjm0lY+NDQY3ttQg6ruHbDStP8GxSRoM8231Mel8kCgs18rzelL/UXm+xw7ans= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=hZl5Ir4u; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="hZl5Ir4u" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1740599744; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=7CVED5K0DPAsIBLFdlZi8ssHHoQwINfn2NzlJIvPUEk=; b=hZl5Ir4uEv8B2TAGSU0uThgqP7rObKLtYYO/u5C0oAuPqqL3S6IRLLwijmfeomh7vkcJfj GFKuKC7METBvUFcRAIqo59BpDC/ePOfvOUONKqJ8Q72ZLzIcv6rXTPN9rn0edYzR3zVX5I yhO8xShv7SsJp1frA9HAyy43u54TuHQ= Received: from mx-prod-mc-01.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-558-ko3nC-gQNFuhUkyjpuplLA-1; Wed, 26 Feb 2025 14:55:42 -0500 X-MC-Unique: ko3nC-gQNFuhUkyjpuplLA-1 X-Mimecast-MFC-AGG-ID: ko3nC-gQNFuhUkyjpuplLA_1740599741 Received: from mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.17]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 4C01719783B2; Wed, 26 Feb 2025 19:55:41 +0000 (UTC) Received: from virtlab1023.lab.eng.rdu2.redhat.com (virtlab1023.lab.eng.rdu2.redhat.com [10.8.1.187]) by mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 58AAB196ADAB; Wed, 26 Feb 2025 19:55:40 +0000 (UTC) From: Paolo Bonzini To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: seanjc@google.com, Yan Zhao , Rick Edgecombe , Isaku Yamahata Subject: [PATCH 07/29] KVM: x86/tdp_mmu: Add a helper function to walk down the TDP MMU Date: Wed, 26 Feb 2025 14:55:07 -0500 Message-ID: <20250226195529.2314580-8-pbonzini@redhat.com> In-Reply-To: <20250226195529.2314580-1-pbonzini@redhat.com> References: <20250226195529.2314580-1-pbonzini@redhat.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.0 on 10.30.177.17 From: Isaku Yamahata Export a function to walk down the TDP without modifying it and simply check if a GPA is mapped. Future changes will support pre-populating TDX private memory. In order to implement this KVM will need to check if a given GFN is already pre-populated in the mirrored EPT. [1] There is already a TDP MMU walker, kvm_tdp_mmu_get_walk() for use within the KVM MMU that almost does what is required. However, to make sense of the results, MMU internal PTE helpers are needed. Refactor the code to provide a helper that can be used outside of the KVM MMU code. Refactoring the KVM page fault handler to support this lookup usage was also considered, but it was an awkward fit. kvm_tdp_mmu_gpa_is_mapped() is based on a diff by Paolo Bonzini. Link: https://lore.kernel.org/kvm/ZfBkle1eZFfjPI8l@google.com/ [1] Signed-off-by: Isaku Yamahata Co-developed-by: Rick Edgecombe Signed-off-by: Rick Edgecombe Signed-off-by: Yan Zhao Reviewed-by: Paolo Bonzini Message-ID: <20241112073457.22011-1-yan.y.zhao@intel.com> Signed-off-by: Paolo Bonzini --- arch/x86/kvm/mmu.h | 3 +++ arch/x86/kvm/mmu/mmu.c | 3 +-- arch/x86/kvm/mmu/tdp_mmu.c | 37 ++++++++++++++++++++++++++++++++----- 3 files changed, 36 insertions(+), 7 deletions(-) diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h index 050a0e229a4d..8d0fca4b4a50 100644 --- a/arch/x86/kvm/mmu.h +++ b/arch/x86/kvm/mmu.h @@ -253,6 +253,9 @@ extern bool tdp_mmu_enabled; #define tdp_mmu_enabled false #endif +bool kvm_tdp_mmu_gpa_is_mapped(struct kvm_vcpu *vcpu, u64 gpa); +int kvm_tdp_map_page(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code, u8 *level); + static inline bool kvm_memslots_have_rmaps(struct kvm *kvm) { return !tdp_mmu_enabled || kvm_shadow_root_allocated(kvm); diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 9d5d50e2464f..800684c3b2c9 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -4685,8 +4685,7 @@ int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) return direct_page_fault(vcpu, fault); } -static int kvm_tdp_map_page(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code, - u8 *level) +int kvm_tdp_map_page(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code, u8 *level) { int r; diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index 046b6ba31197..22675a5746d0 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -1894,16 +1894,13 @@ bool kvm_tdp_mmu_write_protect_gfn(struct kvm *kvm, * * Must be called between kvm_tdp_mmu_walk_lockless_{begin,end}. */ -int kvm_tdp_mmu_get_walk(struct kvm_vcpu *vcpu, u64 addr, u64 *sptes, - int *root_level) +static int __kvm_tdp_mmu_get_walk(struct kvm_vcpu *vcpu, u64 addr, u64 *sptes, + struct kvm_mmu_page *root) { - struct kvm_mmu_page *root = root_to_sp(vcpu->arch.mmu->root.hpa); struct tdp_iter iter; gfn_t gfn = addr >> PAGE_SHIFT; int leaf = -1; - *root_level = vcpu->arch.mmu->root_role.level; - tdp_mmu_for_each_pte(iter, vcpu->kvm, root, gfn, gfn + 1) { leaf = iter.level; sptes[leaf] = iter.old_spte; @@ -1912,6 +1909,36 @@ int kvm_tdp_mmu_get_walk(struct kvm_vcpu *vcpu, u64 addr, u64 *sptes, return leaf; } +int kvm_tdp_mmu_get_walk(struct kvm_vcpu *vcpu, u64 addr, u64 *sptes, + int *root_level) +{ + struct kvm_mmu_page *root = root_to_sp(vcpu->arch.mmu->root.hpa); + *root_level = vcpu->arch.mmu->root_role.level; + + return __kvm_tdp_mmu_get_walk(vcpu, addr, sptes, root); +} + +bool kvm_tdp_mmu_gpa_is_mapped(struct kvm_vcpu *vcpu, u64 gpa) +{ + struct kvm *kvm = vcpu->kvm; + bool is_direct = kvm_is_addr_direct(kvm, gpa); + hpa_t root = is_direct ? vcpu->arch.mmu->root.hpa : + vcpu->arch.mmu->mirror_root_hpa; + u64 sptes[PT64_ROOT_MAX_LEVEL + 1], spte; + int leaf; + + lockdep_assert_held(&kvm->mmu_lock); + rcu_read_lock(); + leaf = __kvm_tdp_mmu_get_walk(vcpu, gpa, sptes, root_to_sp(root)); + rcu_read_unlock(); + if (leaf < 0) + return false; + + spte = sptes[leaf]; + return is_shadow_present_pte(spte) && is_last_spte(spte, leaf); +} +EXPORT_SYMBOL_GPL(kvm_tdp_mmu_gpa_is_mapped); + /* * Returns the last level spte pointer of the shadow page walk for the given * gpa, and sets *spte to the spte value. This spte may be non-preset. If no From patchwork Wed Feb 26 19:55:08 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Bonzini X-Patchwork-Id: 13993102 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6882525E466 for ; Wed, 26 Feb 2025 19:55:48 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740599751; cv=none; b=b7JRw8F5m0YHHSPE+spDaQJp1W9pMnbnTiBvfQz2tUfiWmw/Dym117Cc1nWDA97GDU9R+xa2CojHmL92dh5kRf/vy/299MwTmprl1qWcRffc/rddzZWu701UXwNXegpxhJ987zr7LiT/DSh4AlylMFksRx8UsYJEBJZCZThRd0o= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740599751; c=relaxed/simple; bh=71c0Xtv1G3sdZShTLwpLDw/Yx2zIkCmErPobYWY6JJA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=Tnkmwtqi0z0K5MaS6YVLxZJlCJgDImG5QGhz+IiyC5NvzoJh5is6m22a5U7LzhJEBuNGkB+HYPXzvvTyg/Dzd1wwyaIY4vYMUsFPOOiZ3ZPkeboRDWpJFjryiJey34VzeMWcmsuXBmG9iKfZBzK851srmhVZg0xQVaSua3Kh4Fw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=ecKu+W1x; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="ecKu+W1x" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1740599747; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=tIU7KaERyuJNgcNNJI4KzTDkF/zeQ9fHuKz3CRSp9RE=; b=ecKu+W1xtqw5cCcyOV5F0//UCHyDpEjgRxq/s2C41lRdDI2Uk/qtofdf7+j4p9oAIK0M3R Rp12LMc0pRwB8JVimAZAunWxTbTC/8O3hKJqb2Lv45tb19PAiWZgjHW9bOzhGGGLrvMEpM C8NNy3+ZX+wZVA8khmhvc7IiZqC8eIg= Received: from mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-615-AL00vHvIPo2IKzW5Vb-kpQ-1; Wed, 26 Feb 2025 14:55:44 -0500 X-MC-Unique: AL00vHvIPo2IKzW5Vb-kpQ-1 X-Mimecast-MFC-AGG-ID: AL00vHvIPo2IKzW5Vb-kpQ_1740599742 Received: from mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.17]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id B6BB219560B9; Wed, 26 Feb 2025 19:55:42 +0000 (UTC) Received: from virtlab1023.lab.eng.rdu2.redhat.com (virtlab1023.lab.eng.rdu2.redhat.com [10.8.1.187]) by mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 7F2001944D02; Wed, 26 Feb 2025 19:55:41 +0000 (UTC) From: Paolo Bonzini To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: seanjc@google.com, Yan Zhao , Rick Edgecombe , Binbin Wu , Yuan Yao Subject: [PATCH 08/29] KVM: x86/mmu: Do not enable page track for TD guest Date: Wed, 26 Feb 2025 14:55:08 -0500 Message-ID: <20250226195529.2314580-9-pbonzini@redhat.com> In-Reply-To: <20250226195529.2314580-1-pbonzini@redhat.com> References: <20250226195529.2314580-1-pbonzini@redhat.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.0 on 10.30.177.17 From: Yan Zhao Fail kvm_page_track_write_tracking_enabled() if VM type is TDX to make the external page track user fail in kvm_page_track_register_notifier() since TDX does not support write protection and hence page track. No need to fail KVM internal users of page track (i.e. for shadow page), because TDX is always with EPT enabled and currently TDX module does not emulate and send VMLAUNCH/VMRESUME VMExits to VMM. Suggested-by: Paolo Bonzini Signed-off-by: Yan Zhao Reviewed-by: Binbin Wu Cc: Yuan Yao Message-ID: <20241112073515.22028-1-yan.y.zhao@intel.com> Signed-off-by: Paolo Bonzini --- arch/x86/kvm/mmu/page_track.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/arch/x86/kvm/mmu/page_track.c b/arch/x86/kvm/mmu/page_track.c index 561c331fd6ec..1b17b12393a8 100644 --- a/arch/x86/kvm/mmu/page_track.c +++ b/arch/x86/kvm/mmu/page_track.c @@ -172,6 +172,9 @@ static int kvm_enable_external_write_tracking(struct kvm *kvm) struct kvm_memory_slot *slot; int r = 0, i, bkt; + if (kvm->arch.vm_type == KVM_X86_TDX_VM) + return -EOPNOTSUPP; + mutex_lock(&kvm->slots_arch_lock); /* From patchwork Wed Feb 26 19:55:09 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Bonzini X-Patchwork-Id: 13993105 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1821A25E452 for ; Wed, 26 Feb 2025 19:55:52 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740599754; cv=none; b=boCR+3iWdto8gGPSIySnODFyqfzw/s2lHbNs4JI8WlkT3eB4YXA4sUnomI3VaLxgGql7mnRLvbsinQigXdBQ4BHBE+0JkSXjewOUCO6ftwjW6O5C7LZsdNfw0zAbDnYUIVziboY3HHgea1R4AKOlsexeE2jJDu8KzYX+EhGWF8Q= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740599754; c=relaxed/simple; bh=JTevzJSOnlAuRATUz0vRtud6qxi+PZE0HxxVjGbXBV8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=e52YaqL6fAFJemMVOQNSHW04SsOo7c2IFB/L6ul/DicayaOh52mbdoqlZSCnph0Kpz9i2Q/6XCX9WCH7NCl29+W0jyDBLOS7G487UKWyDmrj19hH/SPiOMZ9jVIPGYSJdm6AT+VALzt0XvtRc2jPI9LcBQfm1fsIhB1OffvMMGQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=VdcGE+q1; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="VdcGE+q1" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1740599752; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=mJbcg7Aq0brQ+4WygHgjr2aYCbLwvw5IluBnLBYUNEI=; b=VdcGE+q1XM+RE87/hzAp1EDPKeiRX0c+CKcJ5aPQqHiHj1XVjCNkXopcX+8mZaf5kpzltM ASrKBHF8ujU+OukiYw2Rt1Z3g2dMgfmi2yFtGET2jQLRDdAIu06nZQXeIGsyZ1lzXmzQP7 z+E7bQGZbajsqZVwryBa+UmQkTflkCI= Received: from mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-643-B3QTnp7yPDeO4aZwocVfPQ-1; Wed, 26 Feb 2025 14:55:45 -0500 X-MC-Unique: B3QTnp7yPDeO4aZwocVfPQ-1 X-Mimecast-MFC-AGG-ID: B3QTnp7yPDeO4aZwocVfPQ_1740599744 Received: from mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.17]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 2C4081800874; Wed, 26 Feb 2025 19:55:44 +0000 (UTC) Received: from virtlab1023.lab.eng.rdu2.redhat.com (virtlab1023.lab.eng.rdu2.redhat.com [10.8.1.187]) by mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id CDFB41944D02; Wed, 26 Feb 2025 19:55:42 +0000 (UTC) From: Paolo Bonzini To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: seanjc@google.com, Yan Zhao , Rick Edgecombe , Sean Christopherson , Isaku Yamahata , Kai Huang , Binbin Wu Subject: [PATCH 09/29] KVM: VMX: Split out guts of EPT violation to common/exposed function Date: Wed, 26 Feb 2025 14:55:09 -0500 Message-ID: <20250226195529.2314580-10-pbonzini@redhat.com> In-Reply-To: <20250226195529.2314580-1-pbonzini@redhat.com> References: <20250226195529.2314580-1-pbonzini@redhat.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.0 on 10.30.177.17 From: Sean Christopherson The difference of TDX EPT violation is how to retrieve information, GPA, and exit qualification. To share the code to handle EPT violation, split out the guts of EPT violation handler so that VMX/TDX exit handler can call it after retrieving GPA and exit qualification. Signed-off-by: Sean Christopherson Co-developed-by: Isaku Yamahata Signed-off-by: Isaku Yamahata Co-developed-by: Rick Edgecombe Signed-off-by: Rick Edgecombe Signed-off-by: Yan Zhao Reviewed-by: Paolo Bonzini Reviewed-by: Kai Huang Reviewed-by: Binbin Wu Message-ID: <20241112073528.22042-1-yan.y.zhao@intel.com> Signed-off-by: Paolo Bonzini --- arch/x86/kvm/vmx/common.h | 34 ++++++++++++++++++++++++++++++++++ arch/x86/kvm/vmx/vmx.c | 25 +++---------------------- 2 files changed, 37 insertions(+), 22 deletions(-) create mode 100644 arch/x86/kvm/vmx/common.h diff --git a/arch/x86/kvm/vmx/common.h b/arch/x86/kvm/vmx/common.h new file mode 100644 index 000000000000..78ae39b6cdcd --- /dev/null +++ b/arch/x86/kvm/vmx/common.h @@ -0,0 +1,34 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +#ifndef __KVM_X86_VMX_COMMON_H +#define __KVM_X86_VMX_COMMON_H + +#include + +#include "mmu.h" + +static inline int __vmx_handle_ept_violation(struct kvm_vcpu *vcpu, gpa_t gpa, + unsigned long exit_qualification) +{ + u64 error_code; + + /* Is it a read fault? */ + error_code = (exit_qualification & EPT_VIOLATION_ACC_READ) + ? PFERR_USER_MASK : 0; + /* Is it a write fault? */ + error_code |= (exit_qualification & EPT_VIOLATION_ACC_WRITE) + ? PFERR_WRITE_MASK : 0; + /* Is it a fetch fault? */ + error_code |= (exit_qualification & EPT_VIOLATION_ACC_INSTR) + ? PFERR_FETCH_MASK : 0; + /* ept page table entry is present? */ + error_code |= (exit_qualification & EPT_VIOLATION_RWX_MASK) + ? PFERR_PRESENT_MASK : 0; + + if (error_code & EPT_VIOLATION_GVA_IS_VALID) + error_code |= (exit_qualification & EPT_VIOLATION_GVA_TRANSLATED) ? + PFERR_GUEST_FINAL_MASK : PFERR_GUEST_PAGE_MASK; + + return kvm_mmu_page_fault(vcpu, gpa, error_code, NULL, 0); +} + +#endif /* __KVM_X86_VMX_COMMON_H */ diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 5a80000d0fdd..18150a38fe09 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -53,6 +53,7 @@ #include #include "capabilities.h" +#include "common.h" #include "cpuid.h" #include "hyperv.h" #include "kvm_onhyperv.h" @@ -5791,11 +5792,8 @@ static int handle_task_switch(struct kvm_vcpu *vcpu) static int handle_ept_violation(struct kvm_vcpu *vcpu) { - unsigned long exit_qualification; + unsigned long exit_qualification = vmx_get_exit_qual(vcpu); gpa_t gpa; - u64 error_code; - - exit_qualification = vmx_get_exit_qual(vcpu); /* * EPT violation happened while executing iret from NMI, @@ -5811,23 +5809,6 @@ static int handle_ept_violation(struct kvm_vcpu *vcpu) gpa = vmcs_read64(GUEST_PHYSICAL_ADDRESS); trace_kvm_page_fault(vcpu, gpa, exit_qualification); - /* Is it a read fault? */ - error_code = (exit_qualification & EPT_VIOLATION_ACC_READ) - ? PFERR_USER_MASK : 0; - /* Is it a write fault? */ - error_code |= (exit_qualification & EPT_VIOLATION_ACC_WRITE) - ? PFERR_WRITE_MASK : 0; - /* Is it a fetch fault? */ - error_code |= (exit_qualification & EPT_VIOLATION_ACC_INSTR) - ? PFERR_FETCH_MASK : 0; - /* ept page table entry is present? */ - error_code |= (exit_qualification & EPT_VIOLATION_RWX_MASK) - ? PFERR_PRESENT_MASK : 0; - - if (error_code & EPT_VIOLATION_GVA_IS_VALID) - error_code |= (exit_qualification & EPT_VIOLATION_GVA_TRANSLATED) ? - PFERR_GUEST_FINAL_MASK : PFERR_GUEST_PAGE_MASK; - /* * Check that the GPA doesn't exceed physical memory limits, as that is * a guest page fault. We have to emulate the instruction here, because @@ -5839,7 +5820,7 @@ static int handle_ept_violation(struct kvm_vcpu *vcpu) if (unlikely(allow_smaller_maxphyaddr && !kvm_vcpu_is_legal_gpa(vcpu, gpa))) return kvm_emulate_instruction(vcpu, 0); - return kvm_mmu_page_fault(vcpu, gpa, error_code, NULL, 0); + return __vmx_handle_ept_violation(vcpu, gpa, exit_qualification); } static int handle_ept_misconfig(struct kvm_vcpu *vcpu) From patchwork Wed Feb 26 19:55:10 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Bonzini X-Patchwork-Id: 13993106 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4ADB9266B4C for ; Wed, 26 Feb 2025 19:55:53 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740599754; cv=none; b=JOfuSNsZHrZfw3SwiPPEzMKJN798PvLyFJdRqdLZgHr8YF9n8kI312tjt9POm5VbaloqyeAz/kiMtG4zgPLXEAK4RhmWNgSpX0QKxviBnJ4IfiGAhE+ixIYDZJdZCFEmy6uCro/9YFECkh6FxCcFediUg8H82BLoOOGjrL6DpCU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740599754; c=relaxed/simple; bh=3orAe8jekjI8BkEIQtuhWJlbMMIReRv2qLBvQ16HpA0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=LL5VVxzrBLfJ+58629KuCipRKbtlWiSQ4ap6bwNY59y8w6eOHvoFcA460WmX+KE7phaYIUmNF5uT+eHiAo+2IQveQGrXK8oVYOMROVwy1RNTwWFrmqFleirJOEKdIE2Atq4V6VFm5O9bYAld7nS6wpZ0AYaQB7FmHOljpiyh12s= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=Vy3r5Gmq; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="Vy3r5Gmq" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1740599752; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=YUMbv+V8CYR6UPN0wtPtSTtVxX9cStGwiQQlkrC9phM=; b=Vy3r5GmqC3EozhVlyNg1Akhtozdta3MG7LKZv0N/yt/s97OwOAr3zSA9OJFlw93O4L0lgi ZtG+DH658Qf7mbIzmOqHIyRYbHVjgGnwgVS5biQH6WxJgwX7hZ4lgjxY8FfpunJFpaP4J2 XEs+jPX0+jVbNY17TaOiitfjo/RK6M4= Received: from mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-645-jBwAYo_KMJqMbS9meJUbGQ-1; Wed, 26 Feb 2025 14:55:47 -0500 X-MC-Unique: jBwAYo_KMJqMbS9meJUbGQ-1 X-Mimecast-MFC-AGG-ID: jBwAYo_KMJqMbS9meJUbGQ_1740599746 Received: from mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id EC9681800871; Wed, 26 Feb 2025 19:55:45 +0000 (UTC) Received: from virtlab1023.lab.eng.rdu2.redhat.com (virtlab1023.lab.eng.rdu2.redhat.com [10.8.1.187]) by mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 2130C300018D; Wed, 26 Feb 2025 19:55:45 +0000 (UTC) From: Paolo Bonzini To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: seanjc@google.com, Yan Zhao , Rick Edgecombe Subject: [PATCH 10/29] KVM: VMX: Teach EPT violation helper about private mem Date: Wed, 26 Feb 2025 14:55:10 -0500 Message-ID: <20250226195529.2314580-11-pbonzini@redhat.com> In-Reply-To: <20250226195529.2314580-1-pbonzini@redhat.com> References: <20250226195529.2314580-1-pbonzini@redhat.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.4 From: Rick Edgecombe Teach EPT violation helper to check shared mask of a GPA to find out whether the GPA is for private memory. When EPT violation is triggered after TD accessing a private GPA, KVM will exit to user space if the corresponding GFN's attribute is not private. User space will then update GFN's attribute during its memory conversion process. After that, TD will re-access the private GPA and trigger EPT violation again. Only with GFN's attribute matches to private, KVM will fault in private page, map it in mirrored TDP root, and propagate changes to private EPT to resolve the EPT violation. Relying on GFN's attribute tracking xarray to determine if a GFN is private, as for KVM_X86_SW_PROTECTED_VM, may lead to endless EPT violations. Signed-off-by: Rick Edgecombe Co-developed-by: Yan Zhao Signed-off-by: Yan Zhao Message-ID: <20241112073539.22056-1-yan.y.zhao@intel.com> Signed-off-by: Paolo Bonzini --- arch/x86/kvm/vmx/common.h | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/arch/x86/kvm/vmx/common.h b/arch/x86/kvm/vmx/common.h index 78ae39b6cdcd..7a592467a044 100644 --- a/arch/x86/kvm/vmx/common.h +++ b/arch/x86/kvm/vmx/common.h @@ -6,6 +6,12 @@ #include "mmu.h" +static inline bool vt_is_tdx_private_gpa(struct kvm *kvm, gpa_t gpa) +{ + /* For TDX the direct mask is the shared mask. */ + return !kvm_is_addr_direct(kvm, gpa); +} + static inline int __vmx_handle_ept_violation(struct kvm_vcpu *vcpu, gpa_t gpa, unsigned long exit_qualification) { @@ -28,6 +34,9 @@ static inline int __vmx_handle_ept_violation(struct kvm_vcpu *vcpu, gpa_t gpa, error_code |= (exit_qualification & EPT_VIOLATION_GVA_TRANSLATED) ? PFERR_GUEST_FINAL_MASK : PFERR_GUEST_PAGE_MASK; + if (vt_is_tdx_private_gpa(vcpu->kvm, gpa)) + error_code |= PFERR_PRIVATE_ACCESS; + return kvm_mmu_page_fault(vcpu, gpa, error_code, NULL, 0); } From patchwork Wed Feb 26 19:55:11 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Bonzini X-Patchwork-Id: 13993104 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DB83426658C for ; Wed, 26 Feb 2025 19:55:52 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740599754; cv=none; b=eoFDY6sONL6T8aN3PekBFmPg6ghFyQOuNvbGeUSPpEkZnPo69cnif7M8lsNjvm2mzS7tRfdC+8T1XLmJXfOESrNHxJPJ1AZtE9hQudIyZgaeiNtycVkIzpCOlJBM6QMkwvQW27rd0UmiHCN3o96wlvzOzYxhOWKEWVEHD9CZUJs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740599754; c=relaxed/simple; bh=TEgDrXOaGNxLHp+ZMToVC5t81PcFu+i/BL72BOcHPtU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=feMCCCcAyYH8PQwTWTAERCKH2idFyGkmQC9q3PMq8LxsH2kDcdfHCARVorFi1ZK2j79gj8kfZOn8BiCK4uhiqY/QeJyIZN6/MGvW1X6ebqCrIH6deh5W9WGt/nuajl4gm6PaVN9/j0kuOsxPMbFwPo4womz8geBehIZQYoTjtIM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=ZJuxB8g1; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="ZJuxB8g1" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1740599751; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=kl/R+UIdUpUSBZ3NVQghzk/4WCvPdLJyhrreIwmEdRo=; b=ZJuxB8g1lOxdvcWosklAnao9aUbkjwPPY0KPQ48oQH8VHysnUZWx6hV0wkIB2RlxEQ/cpr x0TtyaVcCIHcmAvxknmkrMyGTUoONtG/RuAEb2JqD+EYyyzGNfmKAb5b16dLp4f9ZIRzBk OSnSw+l9I/GrbQiZAPic8HYH0dMt0sY= Received: from mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-633-mz7vUJgDPJ2go8CPe0oxZQ-1; Wed, 26 Feb 2025 14:55:48 -0500 X-MC-Unique: mz7vUJgDPJ2go8CPe0oxZQ-1 X-Mimecast-MFC-AGG-ID: mz7vUJgDPJ2go8CPe0oxZQ_1740599747 Received: from mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 206761800874; Wed, 26 Feb 2025 19:55:47 +0000 (UTC) Received: from virtlab1023.lab.eng.rdu2.redhat.com (virtlab1023.lab.eng.rdu2.redhat.com [10.8.1.187]) by mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 2BDB4300019E; Wed, 26 Feb 2025 19:55:46 +0000 (UTC) From: Paolo Bonzini To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: seanjc@google.com, Yan Zhao , Rick Edgecombe , Isaku Yamahata Subject: [PATCH 11/29] KVM: TDX: Add accessors VMX VMCS helpers Date: Wed, 26 Feb 2025 14:55:11 -0500 Message-ID: <20250226195529.2314580-12-pbonzini@redhat.com> In-Reply-To: <20250226195529.2314580-1-pbonzini@redhat.com> References: <20250226195529.2314580-1-pbonzini@redhat.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.4 From: Isaku Yamahata TDX defines SEAMCALL APIs to access TDX control structures corresponding to the VMX VMCS. Introduce helper accessors to hide its SEAMCALL ABI details. Signed-off-by: Isaku Yamahata Co-developed-by: Rick Edgecombe Signed-off-by: Rick Edgecombe Co-developed-by: Yan Zhao Signed-off-by: Yan Zhao Message-ID: <20241112073551.22070-1-yan.y.zhao@intel.com> Signed-off-by: Paolo Bonzini --- arch/x86/kvm/vmx/tdx.c | 13 +++++++ arch/x86/kvm/vmx/tdx.h | 88 ++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 101 insertions(+) diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index d85f74c0a965..3fb5410cecb1 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -36,6 +36,19 @@ static enum cpuhp_state tdx_cpuhp_state; static const struct tdx_sys_info *tdx_sysinfo; +void tdh_vp_rd_failed(struct vcpu_tdx *tdx, char *uclass, u32 field, u64 err) +{ + KVM_BUG_ON(1, tdx->vcpu.kvm); + pr_err("TDH_VP_RD[%s.0x%x] failed 0x%llx\n", uclass, field, err); +} + +void tdh_vp_wr_failed(struct vcpu_tdx *tdx, char *uclass, char *op, u32 field, + u64 val, u64 err) +{ + KVM_BUG_ON(1, tdx->vcpu.kvm); + pr_err("TDH_VP_WR[%s.0x%x]%s0x%llx failed: 0x%llx\n", uclass, field, op, val, err); +} + #define KVM_SUPPORTED_TD_ATTRS (TDX_TD_ATTR_SEPT_VE_DISABLE) static __always_inline struct kvm_tdx *to_kvm_tdx(struct kvm *kvm) diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h index 5fea072b4f56..77d693dfcb08 100644 --- a/arch/x86/kvm/vmx/tdx.h +++ b/arch/x86/kvm/vmx/tdx.h @@ -47,6 +47,10 @@ struct vcpu_tdx { enum vcpu_tdx_state state; }; +void tdh_vp_rd_failed(struct vcpu_tdx *tdx, char *uclass, u32 field, u64 err); +void tdh_vp_wr_failed(struct vcpu_tdx *tdx, char *uclass, char *op, u32 field, + u64 val, u64 err); + static inline bool is_td(struct kvm *kvm) { return kvm->arch.vm_type == KVM_X86_TDX_VM; @@ -68,6 +72,90 @@ static __always_inline u64 td_tdcs_exec_read64(struct kvm_tdx *kvm_tdx, u32 fiel } return data; } + +static __always_inline void tdvps_vmcs_check(u32 field, u8 bits) +{ +#define VMCS_ENC_ACCESS_TYPE_MASK 0x1UL +#define VMCS_ENC_ACCESS_TYPE_FULL 0x0UL +#define VMCS_ENC_ACCESS_TYPE_HIGH 0x1UL +#define VMCS_ENC_ACCESS_TYPE(field) ((field) & VMCS_ENC_ACCESS_TYPE_MASK) + + /* TDX is 64bit only. HIGH field isn't supported. */ + BUILD_BUG_ON_MSG(__builtin_constant_p(field) && + VMCS_ENC_ACCESS_TYPE(field) == VMCS_ENC_ACCESS_TYPE_HIGH, + "Read/Write to TD VMCS *_HIGH fields not supported"); + + BUILD_BUG_ON(bits != 16 && bits != 32 && bits != 64); + +#define VMCS_ENC_WIDTH_MASK GENMASK(14, 13) +#define VMCS_ENC_WIDTH_16BIT (0UL << 13) +#define VMCS_ENC_WIDTH_64BIT (1UL << 13) +#define VMCS_ENC_WIDTH_32BIT (2UL << 13) +#define VMCS_ENC_WIDTH_NATURAL (3UL << 13) +#define VMCS_ENC_WIDTH(field) ((field) & VMCS_ENC_WIDTH_MASK) + + /* TDX is 64bit only. i.e. natural width = 64bit. */ + BUILD_BUG_ON_MSG(bits != 64 && __builtin_constant_p(field) && + (VMCS_ENC_WIDTH(field) == VMCS_ENC_WIDTH_64BIT || + VMCS_ENC_WIDTH(field) == VMCS_ENC_WIDTH_NATURAL), + "Invalid TD VMCS access for 64-bit field"); + BUILD_BUG_ON_MSG(bits != 32 && __builtin_constant_p(field) && + VMCS_ENC_WIDTH(field) == VMCS_ENC_WIDTH_32BIT, + "Invalid TD VMCS access for 32-bit field"); + BUILD_BUG_ON_MSG(bits != 16 && __builtin_constant_p(field) && + VMCS_ENC_WIDTH(field) == VMCS_ENC_WIDTH_16BIT, + "Invalid TD VMCS access for 16-bit field"); +} + +#define TDX_BUILD_TDVPS_ACCESSORS(bits, uclass, lclass) \ +static __always_inline u##bits td_##lclass##_read##bits(struct vcpu_tdx *tdx, \ + u32 field) \ +{ \ + u64 err, data; \ + \ + tdvps_##lclass##_check(field, bits); \ + err = tdh_vp_rd(&tdx->vp, TDVPS_##uclass(field), &data); \ + if (unlikely(err)) { \ + tdh_vp_rd_failed(tdx, #uclass, field, err); \ + return 0; \ + } \ + return (u##bits)data; \ +} \ +static __always_inline void td_##lclass##_write##bits(struct vcpu_tdx *tdx, \ + u32 field, u##bits val) \ +{ \ + u64 err; \ + \ + tdvps_##lclass##_check(field, bits); \ + err = tdh_vp_wr(&tdx->vp, TDVPS_##uclass(field), val, \ + GENMASK_ULL(bits - 1, 0)); \ + if (unlikely(err)) \ + tdh_vp_wr_failed(tdx, #uclass, " = ", field, (u64)val, err); \ +} \ +static __always_inline void td_##lclass##_setbit##bits(struct vcpu_tdx *tdx, \ + u32 field, u64 bit) \ +{ \ + u64 err; \ + \ + tdvps_##lclass##_check(field, bits); \ + err = tdh_vp_wr(&tdx->vp, TDVPS_##uclass(field), bit, bit); \ + if (unlikely(err)) \ + tdh_vp_wr_failed(tdx, #uclass, " |= ", field, bit, err); \ +} \ +static __always_inline void td_##lclass##_clearbit##bits(struct vcpu_tdx *tdx, \ + u32 field, u64 bit) \ +{ \ + u64 err; \ + \ + tdvps_##lclass##_check(field, bits); \ + err = tdh_vp_wr(&tdx->vp, TDVPS_##uclass(field), 0, bit); \ + if (unlikely(err)) \ + tdh_vp_wr_failed(tdx, #uclass, " &= ~", field, bit, err);\ +} + +TDX_BUILD_TDVPS_ACCESSORS(16, VMCS, vmcs); +TDX_BUILD_TDVPS_ACCESSORS(32, VMCS, vmcs); +TDX_BUILD_TDVPS_ACCESSORS(64, VMCS, vmcs); #else static inline int tdx_bringup(void) { return 0; } static inline void tdx_cleanup(void) {} From patchwork Wed Feb 26 19:55:12 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Bonzini X-Patchwork-Id: 13993109 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2D2A32571A9 for ; Wed, 26 Feb 2025 19:55:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740599758; cv=none; b=NEd8z7VNJD2r5pAme9PzQxoT39rYes0azooLuOT7UbBgfQFVRMRLjwtf6tJAKB/CjU9btABEIwW/1nrnqvqoFLf5OQUCpBDFiuq/AZ1+e76EdXLYGZwwrBnl5Ks8sjoE94HB18tPYuKkk0Up0jQQoIawHkFR70mY+/eV8qB7ynU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740599758; c=relaxed/simple; bh=ld11MLBpXftf1aWGxRDv+Z3bBLYwDY8YDKIsP2Xt5kg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=HbhZw49GPZdeqgwkZ59SIo7Z7hWkMigI1Ckjy/rKyi/ld0DolwaV0RFoaJQ8iRMP6y1ombP0EOLg4AK48LQKjzztZmjPXuSOZ0FN9k0CKomu7CjsitD/CUhzKf2I1NLQlqIp8qIenlFK4wNWTyrH2FPEs6MpbFEIa4qGcKgobgY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=LuVdreIJ; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="LuVdreIJ" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1740599755; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=0E+kdusSIzYEiOWmr9EP3sdYGFdepGjVULCJ4wForck=; b=LuVdreIJp1fhsCo0NnS/0EitGAcjvkEL1IDqyWPzVtj6TYRufexJX6aha7EXkTMSU93IUe MfhvMixCT4qto0/R6XbSu0SG1F8+j2u+p3xrqoW6SOzNqnamHhj1YiFknFZONqbd8rIhqM cWTo8EU5xQr0eiJKIYq3N1RTzGzSnrY= Received: from mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-625-GmFIXz-vOvWQ5Uh9mdZoyA-1; Wed, 26 Feb 2025 14:55:49 -0500 X-MC-Unique: GmFIXz-vOvWQ5Uh9mdZoyA-1 X-Mimecast-MFC-AGG-ID: GmFIXz-vOvWQ5Uh9mdZoyA_1740599748 Received: from mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 632A1180087F; Wed, 26 Feb 2025 19:55:48 +0000 (UTC) Received: from virtlab1023.lab.eng.rdu2.redhat.com (virtlab1023.lab.eng.rdu2.redhat.com [10.8.1.187]) by mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 52CAE300018D; Wed, 26 Feb 2025 19:55:47 +0000 (UTC) From: Paolo Bonzini To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: seanjc@google.com, Yan Zhao , Rick Edgecombe , Sean Christopherson , Isaku Yamahata Subject: [PATCH 12/29] KVM: TDX: Add load_mmu_pgd method for TDX Date: Wed, 26 Feb 2025 14:55:12 -0500 Message-ID: <20250226195529.2314580-13-pbonzini@redhat.com> In-Reply-To: <20250226195529.2314580-1-pbonzini@redhat.com> References: <20250226195529.2314580-1-pbonzini@redhat.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.4 From: Sean Christopherson TDX uses two EPT pointers, one for the private half of the GPA space and one for the shared half. The private half uses the normal EPT_POINTER vmcs field, which is managed in a special way by the TDX module. For TDX, KVM is not allowed to operate on it directly. The shared half uses a new SHARED_EPT_POINTER field and will be managed by the conventional MMU management operations that operate directly on the EPT root. This means for TDX the .load_mmu_pgd() operation will need to know to use the SHARED_EPT_POINTER field instead of the normal one. Add a new wrapper in x86 ops for load_mmu_pgd() that either directs the write to the existing vmx implementation or a TDX one. tdx_load_mmu_pgd() is so much simpler than vmx_load_mmu_pgd() since for the TDX mode of operation, EPT will always be used and KVM does not need to be involved in virtualization of CR3 behavior. So tdx_load_mmu_pgd() can simply write to SHARED_EPT_POINTER. Signed-off-by: Sean Christopherson Co-developed-by: Isaku Yamahata Signed-off-by: Isaku Yamahata Co-developed-by: Rick Edgecombe Signed-off-by: Rick Edgecombe Co-developed-by: Yan Zhao Signed-off-by: Yan Zhao Reviewed-by: Paolo Bonzini Message-ID: <20241112073601.22084-1-yan.y.zhao@intel.com> Signed-off-by: Paolo Bonzini --- arch/x86/include/asm/vmx.h | 1 + arch/x86/kvm/vmx/main.c | 13 ++++++++++++- arch/x86/kvm/vmx/tdx.c | 15 +++++++++++++++ arch/x86/kvm/vmx/x86_ops.h | 4 ++++ 4 files changed, 32 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h index f7fd4369b821..9298fb9d4bb3 100644 --- a/arch/x86/include/asm/vmx.h +++ b/arch/x86/include/asm/vmx.h @@ -256,6 +256,7 @@ enum vmcs_field { TSC_MULTIPLIER_HIGH = 0x00002033, TERTIARY_VM_EXEC_CONTROL = 0x00002034, TERTIARY_VM_EXEC_CONTROL_HIGH = 0x00002035, + SHARED_EPT_POINTER = 0x0000203C, PID_POINTER_TABLE = 0x00002042, PID_POINTER_TABLE_HIGH = 0x00002043, GUEST_PHYSICAL_ADDRESS = 0x00002400, diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c index e7d402b3a90d..8ed08c53c02f 100644 --- a/arch/x86/kvm/vmx/main.c +++ b/arch/x86/kvm/vmx/main.c @@ -98,6 +98,17 @@ static void vt_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event) vmx_vcpu_reset(vcpu, init_event); } +static void vt_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa, + int pgd_level) +{ + if (is_td_vcpu(vcpu)) { + tdx_load_mmu_pgd(vcpu, root_hpa, pgd_level); + return; + } + + vmx_load_mmu_pgd(vcpu, root_hpa, pgd_level); +} + static int vt_mem_enc_ioctl(struct kvm *kvm, void __user *argp) { if (!is_td(kvm)) @@ -231,7 +242,7 @@ struct kvm_x86_ops vt_x86_ops __initdata = { .write_tsc_offset = vmx_write_tsc_offset, .write_tsc_multiplier = vmx_write_tsc_multiplier, - .load_mmu_pgd = vmx_load_mmu_pgd, + .load_mmu_pgd = vt_load_mmu_pgd, .check_intercept = vmx_check_intercept, .handle_exit_irqoff = vmx_handle_exit_irqoff, diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index 3fb5410cecb1..ec86c97ada80 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -32,6 +32,9 @@ bool enable_tdx __ro_after_init; module_param_named(tdx, enable_tdx, bool, 0444); +#define TDX_SHARED_BIT_PWL_5 gpa_to_gfn(BIT_ULL(51)) +#define TDX_SHARED_BIT_PWL_4 gpa_to_gfn(BIT_ULL(47)) + static enum cpuhp_state tdx_cpuhp_state; static const struct tdx_sys_info *tdx_sysinfo; @@ -490,6 +493,18 @@ void tdx_vcpu_free(struct kvm_vcpu *vcpu) tdx->state = VCPU_TD_STATE_UNINITIALIZED; } + +void tdx_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa, int pgd_level) +{ + u64 shared_bit = (pgd_level == 5) ? TDX_SHARED_BIT_PWL_5 : + TDX_SHARED_BIT_PWL_4; + + if (KVM_BUG_ON(shared_bit != kvm_gfn_direct_bits(vcpu->kvm), vcpu->kvm)) + return; + + td_vmcs_write64(to_tdx(vcpu), SHARED_EPT_POINTER, root_hpa); +} + static int tdx_get_capabilities(struct kvm_tdx_cmd *cmd) { const struct tdx_sys_info_td_conf *td_conf = &tdx_sysinfo->td_conf; diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h index b6365939cf9f..ad1c2f3e546e 100644 --- a/arch/x86/kvm/vmx/x86_ops.h +++ b/arch/x86/kvm/vmx/x86_ops.h @@ -131,6 +131,8 @@ int tdx_vcpu_create(struct kvm_vcpu *vcpu); void tdx_vcpu_free(struct kvm_vcpu *vcpu); int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp); + +void tdx_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa, int root_level); #else static inline int tdx_vm_init(struct kvm *kvm) { return -EOPNOTSUPP; } static inline void tdx_mmu_release_hkid(struct kvm *kvm) {} @@ -141,6 +143,8 @@ static inline int tdx_vcpu_create(struct kvm_vcpu *vcpu) { return -EOPNOTSUPP; } static inline void tdx_vcpu_free(struct kvm_vcpu *vcpu) {} static inline int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp) { return -EOPNOTSUPP; } + +static inline void tdx_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa, int root_level) {} #endif #endif /* __KVM_X86_VMX_X86_OPS_H */ From patchwork Wed Feb 26 19:55:13 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Bonzini X-Patchwork-Id: 13993107 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0C2B7266B75 for ; Wed, 26 Feb 2025 19:55:53 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740599755; cv=none; b=BY3lPCrOUIy8s9+hlP39sSILwUlZlLk4NRPhMQObXUJ8lRV6tej9n8pZB+q2i2LZNrsjxb+MdJC0gUGO8VI/MAhuHeNSVDQtL0sZd49wj6DmnquHCvcShr5yIHZ2sjeLdSkk/JRfDSl9VHOVyBT3vLi5blFla8/AThuHPxrmR9Y= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740599755; c=relaxed/simple; bh=nxYqlQ4Q76H8m98xHIkGeyvGt6qfnFwkDe9z2ChTqwM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=QLZIms+amXM0kd81zZU50m9RssJhRsBeeGT4cLn//hdx6AOFJCwlJUBjfLMiVMIXMYf8hrz27JU5crBuAD15XuyHtPcCY6vNfPi+5D7nHIxRnrWpeTaLEtlHpt4+a1rSBYnjNEFmJWTz1n4P1s3qzb7MHJI/00dezMFo4GRpnjg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=UhiYODqF; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="UhiYODqF" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1740599753; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=2O8ZjLwnVFyUzLWe+2dEyoNSVpwq+lxZUDZDqtjC68w=; b=UhiYODqFFtq8hTSotub/eVXQTriZqR0imppSTEyPHxegAj365saOnblNMqMb9yjeB5E3nl PGL9+6rztietY++Nd5GXejXakrpSdBp2wolmhmsekChlu8u74cR//aM8R9ILjzW3qry2gP GiyWXcYKQvAK+P52mSYLUc0g+tef3u8= Received: from mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-271-Lzx8YSyyPg-lPQHpNtGVPg-1; Wed, 26 Feb 2025 14:55:51 -0500 X-MC-Unique: Lzx8YSyyPg-lPQHpNtGVPg-1 X-Mimecast-MFC-AGG-ID: Lzx8YSyyPg-lPQHpNtGVPg_1740599749 Received: from mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 8988A1800876; Wed, 26 Feb 2025 19:55:49 +0000 (UTC) Received: from virtlab1023.lab.eng.rdu2.redhat.com (virtlab1023.lab.eng.rdu2.redhat.com [10.8.1.187]) by mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 96770300018D; Wed, 26 Feb 2025 19:55:48 +0000 (UTC) From: Paolo Bonzini To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: seanjc@google.com, Yan Zhao , Rick Edgecombe , Isaku Yamahata Subject: [PATCH 13/29] KVM: TDX: Set gfn_direct_bits to shared bit Date: Wed, 26 Feb 2025 14:55:13 -0500 Message-ID: <20250226195529.2314580-14-pbonzini@redhat.com> In-Reply-To: <20250226195529.2314580-1-pbonzini@redhat.com> References: <20250226195529.2314580-1-pbonzini@redhat.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.4 From: Isaku Yamahata Make the direct root handle memslot GFNs at an alias with the TDX shared bit set. For TDX shared memory, the memslot GFNs need to be mapped at an alias with the shared bit set. These shared mappings will be mapped on the KVM MMU's "direct" root. The direct root has it's mappings shifted by applying "gfn_direct_bits" as a mask. The concept of "GPAW" (guest physical address width) determines the location of the shared bit. So set gfn_direct_bits based on this, to map shared memory at the proper GPA. Signed-off-by: Isaku Yamahata Co-developed-by: Rick Edgecombe Signed-off-by: Rick Edgecombe Co-developed-by: Yan Zhao Signed-off-by: Yan Zhao Reviewed-by: Paolo Bonzini Message-ID: <20241112073613.22100-1-yan.y.zhao@intel.com> Signed-off-by: Paolo Bonzini --- arch/x86/kvm/vmx/tdx.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index ec86c97ada80..09c4d314e6f5 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -1045,6 +1045,11 @@ static int tdx_td_init(struct kvm *kvm, struct kvm_tdx_cmd *cmd) kvm_tdx->attributes = td_params->attributes; kvm_tdx->xfam = td_params->xfam; + if (td_params->config_flags & TDX_CONFIG_FLAGS_MAX_GPAW) + kvm->arch.gfn_direct_bits = TDX_SHARED_BIT_PWL_5; + else + kvm->arch.gfn_direct_bits = TDX_SHARED_BIT_PWL_4; + kvm_tdx->state = TD_STATE_INITIALIZED; out: /* kfree() accepts NULL. */ From patchwork Wed Feb 26 19:55:14 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Bonzini X-Patchwork-Id: 13993108 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2D378269896 for ; Wed, 26 Feb 2025 19:55:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740599757; cv=none; b=G+aB2IcQ/QrKD2yIjzg06d0dIK3RqYofqt1D0KRDP2luEztn0MoQxVCicFJpyLbE+EF6YM+X/mjAm36h6BsdO+nRAu6qvnTHLWeNSUZWo/H6koOhs1aJouDzycgafPiSLrF0lUQ3bN62wQdsfHEJxzooR7yngqxf1JhRXA/dGbk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740599757; c=relaxed/simple; bh=F025vLV3UwG21q9cdNaC0aKHZ+8i1sVJ+mfSsqoR2V8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=HD0tJCVdTEEfRn+W/203unZYSBPXbTcX4g7CyEUcy95Li5XCmk4EoMgpyNsjBRAhoPcpEE0dpsOdTs4oByl3yJTinJL0ymmv8N/DPtag7ipbrzSKVid5hW63sGr/TB+xFnj8PjQWuPHJQhVvdQo4mlLKl33Y7GfSsFKPUQZA+vY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=PYOmi0tL; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="PYOmi0tL" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1740599755; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=rA7NLTcBfcUdNskRipGyQAefqVduQKvAjJ9oSoOlNKY=; b=PYOmi0tL949nqiFILlmDcaSY7rIEUqqBT8b+dmYpjn782nIZFrq7vu2uj65QFXxr6+j0Xv ygeTmrJi9x2jKiKeewO6yxYEMgH2hgyyaMRNIYllIrbY+EnkDZ/VHxT/R7KBmW3mzDFSlq mekwzwgRCe32s1T6fVk1lcneyiMKhUQ= Received: from mx-prod-mc-01.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-423-PvXdJRFKOiicNwyJR4RI0g-1; Wed, 26 Feb 2025 14:55:51 -0500 X-MC-Unique: PvXdJRFKOiicNwyJR4RI0g-1 X-Mimecast-MFC-AGG-ID: PvXdJRFKOiicNwyJR4RI0g_1740599750 Received: from mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id B185219783B4; Wed, 26 Feb 2025 19:55:50 +0000 (UTC) Received: from virtlab1023.lab.eng.rdu2.redhat.com (virtlab1023.lab.eng.rdu2.redhat.com [10.8.1.187]) by mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id BD6C3300018D; Wed, 26 Feb 2025 19:55:49 +0000 (UTC) From: Paolo Bonzini To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: seanjc@google.com, Yan Zhao , Rick Edgecombe , Isaku Yamahata Subject: [PATCH 14/29] KVM: TDX: Require TDP MMU, mmio caching and EPT A/D bits for TDX Date: Wed, 26 Feb 2025 14:55:14 -0500 Message-ID: <20250226195529.2314580-15-pbonzini@redhat.com> In-Reply-To: <20250226195529.2314580-1-pbonzini@redhat.com> References: <20250226195529.2314580-1-pbonzini@redhat.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.4 From: Isaku Yamahata Disable TDX support when TDP MMU or mmio caching or EPT A/D bits aren't supported. As TDP MMU is becoming main stream than the legacy MMU, the legacy MMU support for TDX isn't implemented. TDX requires KVM mmio caching. Without mmio caching, KVM will go to MMIO emulation without installing SPTEs for MMIOs. However, TDX guest is protected and KVM would meet errors when trying to emulate MMIOs for TDX guest during instruction decoding. So, TDX guest relies on SPTEs being installed for MMIOs, which are with no RWX bits and with VE suppress bit unset, to inject VE to TDX guest. The TDX guest would then issue TDVMCALL in the VE handler to perform instruction decoding and have host do MMIO emulation. TDX also relies on EPT A/D bits as EPT A/D bits have been supported in all CPUs since Haswell. Relying on it can avoid RWX bits being masked out in the mirror page table for prefaulted entries. Signed-off-by: Isaku Yamahata Co-developed-by: Rick Edgecombe Signed-off-by: Rick Edgecombe Co-developed-by: Yan Zhao Signed-off-by: Yan Zhao Reviewed-by: Paolo Bonzini Signed-off-by: Yan Zhao --- Requested by Sean at [1]. [1] https://lore.kernel.org/kvm/Zva4aORxE9ljlMNe@google.com/ Signed-off-by: Paolo Bonzini --- arch/x86/kvm/mmu/mmu.c | 1 + arch/x86/kvm/vmx/main.c | 1 + arch/x86/kvm/vmx/tdx.c | 10 ++++++++++ 3 files changed, 12 insertions(+) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 800684c3b2c9..48a7e6f32f7f 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -110,6 +110,7 @@ static bool __ro_after_init tdp_mmu_allowed; #ifdef CONFIG_X86_64 bool __read_mostly tdp_mmu_enabled = true; module_param_named(tdp_mmu, tdp_mmu_enabled, bool, 0444); +EXPORT_SYMBOL_GPL(tdp_mmu_enabled); #endif static int max_huge_page_level __read_mostly; diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c index 8ed08c53c02f..a4cb3d6b2986 100644 --- a/arch/x86/kvm/vmx/main.c +++ b/arch/x86/kvm/vmx/main.c @@ -3,6 +3,7 @@ #include "x86_ops.h" #include "vmx.h" +#include "mmu.h" #include "nested.h" #include "pmu.h" #include "posted_intr.h" diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index 09c4d314e6f5..a44a50db9199 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -1528,6 +1528,16 @@ int __init tdx_bringup(void) if (!enable_tdx) return 0; + if (!enable_ept) { + pr_err("EPT is required for TDX\n"); + goto success_disable_tdx; + } + + if (!tdp_mmu_enabled || !enable_mmio_caching || !enable_ept_ad_bits) { + pr_err("TDP MMU and MMIO caching and EPT A/D bit is required for TDX\n"); + goto success_disable_tdx; + } + if (!cpu_feature_enabled(X86_FEATURE_MOVDIR64B)) { pr_err("tdx: MOVDIR64B is required for TDX\n"); goto success_disable_tdx; From patchwork Wed Feb 26 19:55:15 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Bonzini X-Patchwork-Id: 13993112 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4373326FD80 for ; Wed, 26 Feb 2025 19:56:01 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740599762; cv=none; b=h27FQPy+Oxo0u2C0PoYvXJzJP3MGvA532G6D1OnkK4jRGXOoeZOsHsC+2SUPg7d/FQSP9YMGqjGgCUrrXzTV8lSng+6b5GQmH4BBykMVr7T8PySnkY8IAMq3H9iTNnD6Tw59LH5Qv+FQ+Y81dFsN21841kiB7t8Igj0JdH3yD7w= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740599762; c=relaxed/simple; bh=I0m/UyhWDbKxRmQr78dCOQ7ktjKAe1rGbG0Pai33N8g=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=QRCdEZiiGJv5/IKnQ9pHAclY9emeBerkzB/BWjgAdGdpeqtEI1L9MG3tPXvTFioF2MDwQ/k4MtIVwl7VFdTOOvA1bVv5FLpCukmKaRfgU606cwSYhFz7JYOBjHuuyUNOqf8FQe2O6FmRdxYLo46LW5yV7gVagmGiOj7LiWJrpRk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=QSosbIPb; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="QSosbIPb" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1740599759; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=J6scfnPXKYb5nPx3bz4ITiIsd9lknNFX39Usf1FHUuE=; b=QSosbIPbm4aGvhIFWHMAyYs8bJWeb6/K5HCRVwZPxdB24e/wLVomLSyB0mYE1faqRU8Icf ZMDI/Cj/q0MQLFdR4hfx7oWSTQ1yxGmNRWNhQ5eBDib4ZhdxJWcwA2sBldTQ8U4nCAuGpF KoVCbZ/L3P5lVOM3UDTTzdTk7gG7B5w= Received: from mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-47-7psDDalyP2mhke6z4h0KVg-1; Wed, 26 Feb 2025 14:55:53 -0500 X-MC-Unique: 7psDDalyP2mhke6z4h0KVg-1 X-Mimecast-MFC-AGG-ID: 7psDDalyP2mhke6z4h0KVg_1740599752 Received: from mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id D88A51800874; Wed, 26 Feb 2025 19:55:51 +0000 (UTC) Received: from virtlab1023.lab.eng.rdu2.redhat.com (virtlab1023.lab.eng.rdu2.redhat.com [10.8.1.187]) by mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id E4F26300018D; Wed, 26 Feb 2025 19:55:50 +0000 (UTC) From: Paolo Bonzini To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: seanjc@google.com, Yan Zhao , Rick Edgecombe , Isaku Yamahata Subject: [PATCH 15/29] KVM: x86/mmu: Add setter for shadow_mmio_value Date: Wed, 26 Feb 2025 14:55:15 -0500 Message-ID: <20250226195529.2314580-16-pbonzini@redhat.com> In-Reply-To: <20250226195529.2314580-1-pbonzini@redhat.com> References: <20250226195529.2314580-1-pbonzini@redhat.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.4 From: Isaku Yamahata Future changes will want to set shadow_mmio_value from TDX code. Add a helper to setter with a name that makes more sense from that context. Signed-off-by: Isaku Yamahata [split into new patch] Co-developed-by: Rick Edgecombe Signed-off-by: Rick Edgecombe Signed-off-by: Yan Zhao Reviewed-by: Paolo Bonzini Message-ID: <20241112073730.22200-1-yan.y.zhao@intel.com> Signed-off-by: Paolo Bonzini --- arch/x86/kvm/mmu.h | 1 + arch/x86/kvm/mmu/spte.c | 6 ++++++ 2 files changed, 7 insertions(+) diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h index 8d0fca4b4a50..47e64a3c4ce3 100644 --- a/arch/x86/kvm/mmu.h +++ b/arch/x86/kvm/mmu.h @@ -79,6 +79,7 @@ static inline gfn_t kvm_mmu_max_gfn(void) u8 kvm_mmu_get_max_tdp_level(void); void kvm_mmu_set_mmio_spte_mask(u64 mmio_value, u64 mmio_mask, u64 access_mask); +void kvm_mmu_set_mmio_spte_value(struct kvm *kvm, u64 mmio_value); void kvm_mmu_set_me_spte_mask(u64 me_value, u64 me_mask); void kvm_mmu_set_ept_masks(bool has_ad_bits, bool has_exec_only); diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c index 22551e2f1d00..c42ac5d1f027 100644 --- a/arch/x86/kvm/mmu/spte.c +++ b/arch/x86/kvm/mmu/spte.c @@ -433,6 +433,12 @@ void kvm_mmu_set_mmio_spte_mask(u64 mmio_value, u64 mmio_mask, u64 access_mask) } EXPORT_SYMBOL_GPL(kvm_mmu_set_mmio_spte_mask); +void kvm_mmu_set_mmio_spte_value(struct kvm *kvm, u64 mmio_value) +{ + kvm->arch.shadow_mmio_value = mmio_value; +} +EXPORT_SYMBOL_GPL(kvm_mmu_set_mmio_spte_value); + void kvm_mmu_set_me_spte_mask(u64 me_value, u64 me_mask) { /* shadow_me_value must be a subset of shadow_me_mask */ From patchwork Wed Feb 26 19:55:16 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Bonzini X-Patchwork-Id: 13993110 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EB5702571AE for ; Wed, 26 Feb 2025 19:55:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740599758; cv=none; b=DjjIqj5VvIQQkfavZmCtyTBdZpaudJ8CxtxmonY1/N/2DWRGiBS4VFJ8METzAKtIPK6kHff0P5iBilED/sC440Y9pIhVQWKqXDiuFprgYAE7/+HCtprRMsPAXN6HWFXQ/vIOSyqTepf49EJdAfURxzi2lwiVzUNWU0LqTw2B+uk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740599758; c=relaxed/simple; bh=Yki3UJ9zSdwpkr3EkzCQOa7ss5qvR3z0MFZ/2cGS7kw=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=lWCjUs4ZtIY01F8iSZufuEU56Lc1B5HxQQSaSxtydtz4eO9ypX5owVkddBQxbTjjwfqvs8f2Yoe9sAC0Eu6yvfiZq7fdjRsrI6gXJQDrMrCy8KKSzr251gROetE1gMGqa3hiiQlEccLac1cjgdgB7Vd3tYT3bhNKenM2dAkDQcs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=ZqkJlbY6; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="ZqkJlbY6" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1740599755; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=jsgg1Hgbf9dcIZ2WKNC1cvRkgqrc8iVO38FPcZEHBYY=; b=ZqkJlbY6GEYaoGmfDXwyznm+GGtoTQ8XLdO9138+DPQYEkxP/EATK5CfaGQEVKAhIBeWy/ RQKcAoP3lbugSkU4u7173CCsXQha3sEDlXARSo5cf2qbqqBuRCd3/SA0Wus2ea5hfhMr6z Glm4FX1AEv1TOtKsKZ6Dprg2Wi3agvc= Received: from mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-65-ZEdkVaMTPjWAxlqeN5N09A-1; Wed, 26 Feb 2025 14:55:54 -0500 X-MC-Unique: ZEdkVaMTPjWAxlqeN5N09A-1 X-Mimecast-MFC-AGG-ID: ZEdkVaMTPjWAxlqeN5N09A_1740599753 Received: from mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 0AFDF19560AF; Wed, 26 Feb 2025 19:55:53 +0000 (UTC) Received: from virtlab1023.lab.eng.rdu2.redhat.com (virtlab1023.lab.eng.rdu2.redhat.com [10.8.1.187]) by mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 17A5D300018D; Wed, 26 Feb 2025 19:55:51 +0000 (UTC) From: Paolo Bonzini To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: seanjc@google.com, Yan Zhao , Rick Edgecombe , Isaku Yamahata Subject: [PATCH 16/29] KVM: TDX: Set per-VM shadow_mmio_value to 0 Date: Wed, 26 Feb 2025 14:55:16 -0500 Message-ID: <20250226195529.2314580-17-pbonzini@redhat.com> In-Reply-To: <20250226195529.2314580-1-pbonzini@redhat.com> References: <20250226195529.2314580-1-pbonzini@redhat.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.4 From: Isaku Yamahata Set per-VM shadow_mmio_value to 0 for TDX. With enable_mmio_caching on, KVM installs MMIO SPTEs for TDs. To correctly configure MMIO SPTEs, TDX requires the per-VM shadow_mmio_value to be set to 0. This is necessary to override the default value of the suppress VE bit in the SPTE, which is 1, and to ensure value 0 in RWX bits. For MMIO SPTE, the spte value changes as follows: 1. initial value (suppress VE bit is set) 2. Guest issues MMIO and triggers EPT violation 3. KVM updates SPTE value to MMIO value (suppress VE bit is cleared) 4. Guest MMIO resumes. It triggers VE exception in guest TD 5. Guest VE handler issues TDG.VP.VMCALL 6. KVM handles MMIO 7. Guest VE handler resumes its execution after MMIO instruction Signed-off-by: Isaku Yamahata Co-developed-by: Rick Edgecombe Signed-off-by: Rick Edgecombe Co-developed-by: Yan Zhao Signed-off-by: Yan Zhao Reviewed-by: Paolo Bonzini Message-ID: <20241112073743.22214-1-yan.y.zhao@intel.com> Signed-off-by: Paolo Bonzini --- arch/x86/kvm/mmu/spte.c | 2 -- arch/x86/kvm/vmx/tdx.c | 14 ++++++++++++++ 2 files changed, 14 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c index c42ac5d1f027..e819d16655b6 100644 --- a/arch/x86/kvm/mmu/spte.c +++ b/arch/x86/kvm/mmu/spte.c @@ -96,8 +96,6 @@ u64 make_mmio_spte(struct kvm_vcpu *vcpu, u64 gfn, unsigned int access) u64 spte = generation_mmio_spte_mask(gen); u64 gpa = gfn << PAGE_SHIFT; - WARN_ON_ONCE(!vcpu->kvm->arch.shadow_mmio_value); - access &= shadow_mmio_access_mask; spte |= vcpu->kvm->arch.shadow_mmio_value | access; spte |= gpa | shadow_nonpresent_or_rsvd_mask; diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index a44a50db9199..f42772a5492b 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -8,6 +8,7 @@ #include "x86_ops.h" #include "lapic.h" #include "tdx.h" +#include "mmu/spte.h" #pragma GCC poison to_vmx @@ -410,6 +411,19 @@ int tdx_vm_init(struct kvm *kvm) kvm->arch.has_protected_state = true; kvm->arch.has_private_mem = true; + /* + * Because guest TD is protected, VMM can't parse the instruction in TD. + * Instead, guest uses MMIO hypercall. For unmodified device driver, + * #VE needs to be injected for MMIO and #VE handler in TD converts MMIO + * instruction into MMIO hypercall. + * + * SPTE value for MMIO needs to be setup so that #VE is injected into + * TD instead of triggering EPT MISCONFIG. + * - RWX=0 so that EPT violation is triggered. + * - suppress #VE bit is cleared to inject #VE. + */ + kvm_mmu_set_mmio_spte_value(kvm, 0); + /* * TDX has its own limit of maximum vCPUs it can support for all * TDX guests in addition to KVM_MAX_VCPUS. TDX module reports From patchwork Wed Feb 26 19:55:17 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Bonzini X-Patchwork-Id: 13993111 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 048BF26FA49 for ; Wed, 26 Feb 2025 19:55:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740599761; cv=none; b=j9hrofZha56hpnk0srE99gJTdmm5SFGJ3g3mBanZ8Yc5O/RRw46zIcCYeVK3io0JzpkPJHT/8tO1l87mrHBYaamRn93PvgBBzhoWxiMmgq8aq5AQnaUOoiUJUA1tu/ZiLMWCe5As/vuMlEn6SToPjgjtNT6i/kwILebWFOSH+Do= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740599761; c=relaxed/simple; bh=DHN8D2VM0x3HvVwlaErKkYPHV1qjF2W8CQJXHCF/Z70=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=AVLSTbl902qbOqhXStLyAwjBV9UlwiMfWWzUr1L3ELRkN+CtIBkSZQutQgDqYHmmwI8YxnE3m5eVt1kZTIwaw8ybt4vhW6ze9Tm3tNj7G+9w/kzT7K6mfsjOFSteTOmTyK4XL/EBtJnA0lkRkN15GkvwjIcW0wi6GsPGCY4PfGw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=AUVOJ8gV; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="AUVOJ8gV" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1740599759; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=rJC2MrZUeuCN9GaSnrvyhVplXNA5i+39g2V3LZ01UX8=; b=AUVOJ8gV6Gi/ZLxO8TXGbeDF5DYDyewGNkqjCzxCiSum9Kw/BRXGGkLmIv0I2s5Up+wFwq OUs942NuRfrlxhmC2LBT8GYb/PoBJXxCFX1SB0MB9w+9iCaV18R4RP5IpOqcx+9GkcKI6f xZsL6aldp/v5PwOcQf2XhtNwFkfj6u0= Received: from mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-270-e9zM4soJN7yWua5AvmIe2w-1; Wed, 26 Feb 2025 14:55:55 -0500 X-MC-Unique: e9zM4soJN7yWua5AvmIe2w-1 X-Mimecast-MFC-AGG-ID: e9zM4soJN7yWua5AvmIe2w_1740599754 Received: from mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 325981800879; Wed, 26 Feb 2025 19:55:54 +0000 (UTC) Received: from virtlab1023.lab.eng.rdu2.redhat.com (virtlab1023.lab.eng.rdu2.redhat.com [10.8.1.187]) by mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 3EE45300018D; Wed, 26 Feb 2025 19:55:53 +0000 (UTC) From: Paolo Bonzini To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: seanjc@google.com, Yan Zhao , Rick Edgecombe , Isaku Yamahata Subject: [PATCH 17/29] KVM: TDX: Handle TLB tracking for TDX Date: Wed, 26 Feb 2025 14:55:17 -0500 Message-ID: <20250226195529.2314580-18-pbonzini@redhat.com> In-Reply-To: <20250226195529.2314580-1-pbonzini@redhat.com> References: <20250226195529.2314580-1-pbonzini@redhat.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.4 From: Isaku Yamahata Handle TLB tracking for TDX by introducing function tdx_track() for private memory TLB tracking and implementing flush_tlb* hooks to flush TLBs for shared memory. Introduce function tdx_track() to do TLB tracking on private memory, which basically does two things: calling TDH.MEM.TRACK to increase TD epoch and kicking off all vCPUs. The private EPT will then be flushed when each vCPU re-enters the TD. This function is unused temporarily in this patch and will be called on a page-by-page basis on removal of private guest page in a later patch. In earlier revisions, tdx_track() relied on an atomic counter to coordinate the synchronization between the actions of kicking off vCPUs, incrementing the TD epoch, and the vCPUs waiting for the incremented TD epoch after being kicked off. However, the core MMU only actually needs to call tdx_track() while aleady under a write mmu_lock. So this sychnonization can be made to be unneeded. vCPUs are kicked off only after the successful execution of TDH.MEM.TRACK, eliminating the need for vCPUs to wait for TDH.MEM.TRACK completion after being kicked off. tdx_track() is therefore able to send requests KVM_REQ_OUTSIDE_GUEST_MODE rather than KVM_REQ_TLB_FLUSH. Hooks for flush_remote_tlb and flush_remote_tlbs_range are not necessary for TDX, as tdx_track() will handle TLB tracking of private memory on page-by-page basis when private guest pages are removed. There is no need to invoke tdx_track() again in kvm_flush_remote_tlbs() even after changes to the mirrored page table. For hooks flush_tlb_current and flush_tlb_all, which are invoked during kvm_mmu_load() and vcpu load for normal VMs, let VMM to flush all EPTs in the two hooks for simplicity, since TDX does not depend on the two hooks to notify TDX module to flush private EPT in those cases. Signed-off-by: Isaku Yamahata Co-developed-by: Rick Edgecombe Signed-off-by: Rick Edgecombe Co-developed-by: Yan Zhao Signed-off-by: Yan Zhao Message-ID: <20241112073753.22228-1-yan.y.zhao@intel.com> Signed-off-by: Paolo Bonzini --- arch/x86/kvm/vmx/main.c | 44 +++++++++++++++++++-- arch/x86/kvm/vmx/tdx.c | 81 ++++++++++++++++++++++++++++++++++++++ arch/x86/kvm/vmx/x86_ops.h | 4 ++ 3 files changed, 125 insertions(+), 4 deletions(-) diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c index a4cb3d6b2986..0ea4bec0626e 100644 --- a/arch/x86/kvm/vmx/main.c +++ b/arch/x86/kvm/vmx/main.c @@ -99,6 +99,42 @@ static void vt_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event) vmx_vcpu_reset(vcpu, init_event); } +static void vt_flush_tlb_all(struct kvm_vcpu *vcpu) +{ + if (is_td_vcpu(vcpu)) { + tdx_flush_tlb_all(vcpu); + return; + } + + vmx_flush_tlb_all(vcpu); +} + +static void vt_flush_tlb_current(struct kvm_vcpu *vcpu) +{ + if (is_td_vcpu(vcpu)) { + tdx_flush_tlb_current(vcpu); + return; + } + + vmx_flush_tlb_current(vcpu); +} + +static void vt_flush_tlb_gva(struct kvm_vcpu *vcpu, gva_t addr) +{ + if (is_td_vcpu(vcpu)) + return; + + vmx_flush_tlb_gva(vcpu, addr); +} + +static void vt_flush_tlb_guest(struct kvm_vcpu *vcpu) +{ + if (is_td_vcpu(vcpu)) + return; + + vmx_flush_tlb_guest(vcpu); +} + static void vt_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa, int pgd_level) { @@ -190,10 +226,10 @@ struct kvm_x86_ops vt_x86_ops __initdata = { .set_rflags = vmx_set_rflags, .get_if_flag = vmx_get_if_flag, - .flush_tlb_all = vmx_flush_tlb_all, - .flush_tlb_current = vmx_flush_tlb_current, - .flush_tlb_gva = vmx_flush_tlb_gva, - .flush_tlb_guest = vmx_flush_tlb_guest, + .flush_tlb_all = vt_flush_tlb_all, + .flush_tlb_current = vt_flush_tlb_current, + .flush_tlb_gva = vt_flush_tlb_gva, + .flush_tlb_guest = vt_flush_tlb_guest, .vcpu_pre_run = vmx_vcpu_pre_run, .vcpu_run = vmx_vcpu_run, diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index f42772a5492b..b80fc035e471 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -8,6 +8,7 @@ #include "x86_ops.h" #include "lapic.h" #include "tdx.h" +#include "vmx.h" #include "mmu/spte.h" #pragma GCC poison to_vmx @@ -519,6 +520,51 @@ void tdx_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa, int pgd_level) td_vmcs_write64(to_tdx(vcpu), SHARED_EPT_POINTER, root_hpa); } +/* + * Ensure shared and private EPTs to be flushed on all vCPUs. + * tdh_mem_track() is the only caller that increases TD epoch. An increase in + * the TD epoch (e.g., to value "N + 1") is successful only if no vCPUs are + * running in guest mode with the value "N - 1". + * + * A successful execution of tdh_mem_track() ensures that vCPUs can only run in + * guest mode with TD epoch value "N" if no TD exit occurs after the TD epoch + * being increased to "N + 1". + * + * Kicking off all vCPUs after that further results in no vCPUs can run in guest + * mode with TD epoch value "N", which unblocks the next tdh_mem_track() (e.g. + * to increase TD epoch to "N + 2"). + * + * TDX module will flush EPT on the next TD enter and make vCPUs to run in + * guest mode with TD epoch value "N + 1". + * + * kvm_make_all_cpus_request() guarantees all vCPUs are out of guest mode by + * waiting empty IPI handler ack_kick(). + * + * No action is required to the vCPUs being kicked off since the kicking off + * occurs certainly after TD epoch increment and before the next + * tdh_mem_track(). + */ +static void __always_unused tdx_track(struct kvm *kvm) +{ + struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm); + u64 err; + + /* If TD isn't finalized, it's before any vcpu running. */ + if (unlikely(kvm_tdx->state != TD_STATE_RUNNABLE)) + return; + + lockdep_assert_held_write(&kvm->mmu_lock); + + do { + err = tdh_mem_track(&kvm_tdx->td); + } while (unlikely((err & TDX_SEAMCALL_STATUS_MASK) == TDX_OPERAND_BUSY)); + + if (KVM_BUG_ON(err, kvm)) + pr_tdx_error(TDH_MEM_TRACK, err); + + kvm_make_all_cpus_request(kvm, KVM_REQ_OUTSIDE_GUEST_MODE); +} + static int tdx_get_capabilities(struct kvm_tdx_cmd *cmd) { const struct tdx_sys_info_td_conf *td_conf = &tdx_sysinfo->td_conf; @@ -1073,6 +1119,41 @@ static int tdx_td_init(struct kvm *kvm, struct kvm_tdx_cmd *cmd) return ret; } +void tdx_flush_tlb_current(struct kvm_vcpu *vcpu) +{ + /* + * flush_tlb_current() is invoked when the first time for the vcpu to + * run or when root of shared EPT is invalidated. + * KVM only needs to flush shared EPT because the TDX module handles TLB + * invalidation for private EPT in tdh_vp_enter(); + * + * A single context invalidation for shared EPT can be performed here. + * However, this single context invalidation requires the private EPTP + * rather than the shared EPTP to flush shared EPT, as shared EPT uses + * private EPTP as its ASID for TLB invalidation. + * + * To avoid reading back private EPTP, perform a global invalidation for + * shared EPT instead to keep this function simple. + */ + ept_sync_global(); +} + +void tdx_flush_tlb_all(struct kvm_vcpu *vcpu) +{ + /* + * TDX has called tdx_track() in tdx_sept_remove_private_spte() to + * ensure that private EPT will be flushed on the next TD enter. No need + * to call tdx_track() here again even when this callback is a result of + * zapping private EPT. + * + * Due to the lack of the context to determine which EPT has been + * affected by zapping, invoke invept() directly here for both shared + * EPT and private EPT for simplicity, though it's not necessary for + * private EPT. + */ + ept_sync_global(); +} + int tdx_vm_ioctl(struct kvm *kvm, void __user *argp) { struct kvm_tdx_cmd tdx_cmd; diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h index ad1c2f3e546e..584db5fa8dab 100644 --- a/arch/x86/kvm/vmx/x86_ops.h +++ b/arch/x86/kvm/vmx/x86_ops.h @@ -132,6 +132,8 @@ void tdx_vcpu_free(struct kvm_vcpu *vcpu); int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp); +void tdx_flush_tlb_current(struct kvm_vcpu *vcpu); +void tdx_flush_tlb_all(struct kvm_vcpu *vcpu); void tdx_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa, int root_level); #else static inline int tdx_vm_init(struct kvm *kvm) { return -EOPNOTSUPP; } @@ -144,6 +146,8 @@ static inline void tdx_vcpu_free(struct kvm_vcpu *vcpu) {} static inline int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp) { return -EOPNOTSUPP; } +static inline void tdx_flush_tlb_current(struct kvm_vcpu *vcpu) {} +static inline void tdx_flush_tlb_all(struct kvm_vcpu *vcpu) {} static inline void tdx_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa, int root_level) {} #endif From patchwork Wed Feb 26 19:55:18 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Bonzini X-Patchwork-Id: 13993113 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2EE3326FA7E for ; Wed, 26 Feb 2025 19:56:01 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740599763; cv=none; b=uNX95UGANq+alUAU5NVJ8hYDB2ITPoRbPF1u0RhvWelOwV7FdesD9bCx6dc5i/lf0mt4VRMhJADqm63+HQM2mJKwDbY870uHlw+8KmV5rAYjfAFKtlIlPjAAbY+Sz3ZIOXcsHpm4Hz3EB9Zq1QDx7qfT3TLfyHQW1+KuzkEXavQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740599763; c=relaxed/simple; bh=D25sFxeZGi5p9CBLnDhCYUnUJcbX0dtzmIU6zuJOHvw=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=a9csf62X2qgmz7trtdUGN4567J29pCppL2qKu6F771o1kIj0BtDnwj8461ZKd4AGTRAuiKVc69WwjZZPT0u21BYtZlRba1MuEW7m9REz4yyKx6YTEsH2Gqd5jT6BGYkqS2sunIFhqkjTdC+6Wavst5rtJHnGXLxlUxLUkwF/fSw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=S0RUyDXr; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="S0RUyDXr" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1740599760; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=fLEXFVuq3gIjiUjBuOfpGLR38BH6KwnUmUi7QvTL+28=; b=S0RUyDXreKl6V230QllUhrcb0bSP71wToNl1G2wsXkojW0aD9j1daKhbP0fCzctai2ma10 97tONP9Jfs4/WSI29wzDjujJZ+KPPE8AIymbRFej60/fMXLIM6eqNBLZbMoKcdCegNqzFT Jp3RKyrt4p/GUDa26yQjE1c4M8lT2wA= Received: from mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-283-Xbh67S9rMxiIK6wkCj1VUw-1; Wed, 26 Feb 2025 14:55:56 -0500 X-MC-Unique: Xbh67S9rMxiIK6wkCj1VUw-1 X-Mimecast-MFC-AGG-ID: Xbh67S9rMxiIK6wkCj1VUw_1740599755 Received: from mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 599F71801A18; Wed, 26 Feb 2025 19:55:55 +0000 (UTC) Received: from virtlab1023.lab.eng.rdu2.redhat.com (virtlab1023.lab.eng.rdu2.redhat.com [10.8.1.187]) by mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 65EF0300018D; Wed, 26 Feb 2025 19:55:54 +0000 (UTC) From: Paolo Bonzini To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: seanjc@google.com, Yan Zhao , Rick Edgecombe , Isaku Yamahata Subject: [PATCH 18/29] KVM: TDX: Implement hooks to propagate changes of TDP MMU mirror page table Date: Wed, 26 Feb 2025 14:55:18 -0500 Message-ID: <20250226195529.2314580-19-pbonzini@redhat.com> In-Reply-To: <20250226195529.2314580-1-pbonzini@redhat.com> References: <20250226195529.2314580-1-pbonzini@redhat.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.4 From: Isaku Yamahata Implement hooks in TDX to propagate changes of mirror page table to private EPT, including changes for page table page adding/removing, guest page adding/removing. TDX invokes corresponding SEAMCALLs in the hooks. - Hook link_external_spt propagates adding page table page into private EPT. - Hook set_external_spte tdx_sept_set_private_spte() in this patch only handles adding of guest private page when TD is finalized. Later patches will handle the case of adding guest private pages before TD finalization. - Hook free_external_spt It is invoked when page table page is removed in mirror page table, which currently must occur at TD tear down phase, after hkid is freed. - Hook remove_external_spte It is invoked when guest private page is removed in mirror page table, which can occur when TD is active, e.g. during shared <-> private conversion and slot move/deletion. This hook is ensured to be triggered before hkid is freed, because gmem fd is released along with all private leaf mappings zapped before freeing hkid at VM destroy. TDX invokes below SEAMCALLs sequentially: 1) TDH.MEM.RANGE.BLOCK (remove RWX bits from a private EPT entry), 2) TDH.MEM.TRACK (increases TD epoch) 3) TDH.MEM.PAGE.REMOVE (remove the private EPT entry and untrack the guest page). TDH.MEM.PAGE.REMOVE can't succeed without TDH.MEM.RANGE.BLOCK and TDH.MEM.TRACK being called successfully. SEAMCALL TDH.MEM.TRACK is called in function tdx_track() to enforce that TLB tracking will be performed by TDX module for private EPT. Signed-off-by: Isaku Yamahata Co-developed-by: Rick Edgecombe Signed-off-by: Rick Edgecombe Co-developed-by: Yan Zhao Signed-off-by: Yan Zhao --- - Remove TDX_ERROR_SEPT_BUSY and Add tdx_operand_busy() helper (Binbin) Signed-off-by: Paolo Bonzini --- arch/x86/kvm/vmx/main.c | 14 ++- arch/x86/kvm/vmx/tdx.c | 213 +++++++++++++++++++++++++++++++++++- arch/x86/kvm/vmx/tdx_arch.h | 23 ++++ arch/x86/kvm/vmx/x86_ops.h | 37 +++++++ 4 files changed, 284 insertions(+), 3 deletions(-) diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c index 0ea4bec0626e..0c94810b1f48 100644 --- a/arch/x86/kvm/vmx/main.c +++ b/arch/x86/kvm/vmx/main.c @@ -36,9 +36,21 @@ static __init int vt_hardware_setup(void) * is KVM may allocate couple of more bytes than needed for * each VM. */ - if (enable_tdx) + if (enable_tdx) { vt_x86_ops.vm_size = max_t(unsigned int, vt_x86_ops.vm_size, sizeof(struct kvm_tdx)); + /* + * Note, TDX may fail to initialize in a later time in + * vt_init(), in which case it is not necessary to setup + * those callbacks. But making them valid here even + * when TDX fails to init later is fine because those + * callbacks won't be called if the VM isn't TDX guest. + */ + vt_x86_ops.link_external_spt = tdx_sept_link_private_spt; + vt_x86_ops.set_external_spte = tdx_sept_set_private_spte; + vt_x86_ops.free_external_spt = tdx_sept_free_private_spt; + vt_x86_ops.remove_external_spte = tdx_sept_remove_private_spte; + } return 0; } diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index b80fc035e471..5f38c325dfa6 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -154,6 +154,12 @@ static DEFINE_MUTEX(tdx_lock); static atomic_t nr_configured_hkid; +static bool tdx_operand_busy(u64 err) +{ + return (err & TDX_SEAMCALL_STATUS_MASK) == TDX_OPERAND_BUSY; +} + + static inline void tdx_hkid_free(struct kvm_tdx *kvm_tdx) { tdx_guest_keyid_free(kvm_tdx->hkid); @@ -520,6 +526,160 @@ void tdx_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa, int pgd_level) td_vmcs_write64(to_tdx(vcpu), SHARED_EPT_POINTER, root_hpa); } +static void tdx_unpin(struct kvm *kvm, struct page *page) +{ + put_page(page); +} + +static int tdx_mem_page_aug(struct kvm *kvm, gfn_t gfn, + enum pg_level level, struct page *page) +{ + int tdx_level = pg_level_to_tdx_sept_level(level); + struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm); + gpa_t gpa = gfn_to_gpa(gfn); + u64 entry, level_state; + u64 err; + + err = tdh_mem_page_aug(&kvm_tdx->td, gpa, tdx_level, page, &entry, &level_state); + if (unlikely(tdx_operand_busy(err))) { + tdx_unpin(kvm, page); + return -EBUSY; + } + + if (KVM_BUG_ON(err, kvm)) { + pr_tdx_error_2(TDH_MEM_PAGE_AUG, err, entry, level_state); + tdx_unpin(kvm, page); + return -EIO; + } + + return 0; +} + +int tdx_sept_set_private_spte(struct kvm *kvm, gfn_t gfn, + enum pg_level level, kvm_pfn_t pfn) +{ + struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm); + struct page *page = pfn_to_page(pfn); + + /* TODO: handle large pages. */ + if (KVM_BUG_ON(level != PG_LEVEL_4K, kvm)) + return -EINVAL; + + /* + * Because guest_memfd doesn't support page migration with + * a_ops->migrate_folio (yet), no callback is triggered for KVM on page + * migration. Until guest_memfd supports page migration, prevent page + * migration. + * TODO: Once guest_memfd introduces callback on page migration, + * implement it and remove get_page/put_page(). + */ + get_page(page); + + if (likely(kvm_tdx->state == TD_STATE_RUNNABLE)) + return tdx_mem_page_aug(kvm, gfn, level, page); + + /* + * TODO: KVM_TDX_INIT_MEM_REGION support to populate before finalize + * comes here for the initial memory. + */ + return -EOPNOTSUPP; +} + +static int tdx_sept_drop_private_spte(struct kvm *kvm, gfn_t gfn, + enum pg_level level, struct page *page) +{ + int tdx_level = pg_level_to_tdx_sept_level(level); + struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm); + gpa_t gpa = gfn_to_gpa(gfn); + u64 err, entry, level_state; + + /* TODO: handle large pages. */ + if (KVM_BUG_ON(level != PG_LEVEL_4K, kvm)) + return -EINVAL; + + if (KVM_BUG_ON(!is_hkid_assigned(kvm_tdx), kvm)) + return -EINVAL; + + do { + /* + * When zapping private page, write lock is held. So no race + * condition with other vcpu sept operation. Race only with + * TDH.VP.ENTER. + */ + err = tdh_mem_page_remove(&kvm_tdx->td, gpa, tdx_level, &entry, + &level_state); + } while (unlikely(tdx_operand_busy(err))); + + if (unlikely(kvm_tdx->state != TD_STATE_RUNNABLE && + err == (TDX_EPT_WALK_FAILED | TDX_OPERAND_ID_RCX))) { + /* + * This page was mapped with KVM_MAP_MEMORY, but + * KVM_TDX_INIT_MEM_REGION is not issued yet. + */ + if (!is_last_spte(entry, level) || !(entry & VMX_EPT_RWX_MASK)) { + tdx_unpin(kvm, page); + return 0; + } + } + + if (KVM_BUG_ON(err, kvm)) { + pr_tdx_error_2(TDH_MEM_PAGE_REMOVE, err, entry, level_state); + return -EIO; + } + + err = tdh_phymem_page_wbinvd_hkid((u16)kvm_tdx->hkid, page); + + if (KVM_BUG_ON(err, kvm)) { + pr_tdx_error(TDH_PHYMEM_PAGE_WBINVD, err); + return -EIO; + } + tdx_clear_page(page); + tdx_unpin(kvm, page); + return 0; +} + +int tdx_sept_link_private_spt(struct kvm *kvm, gfn_t gfn, + enum pg_level level, void *private_spt) +{ + int tdx_level = pg_level_to_tdx_sept_level(level); + gpa_t gpa = gfn_to_gpa(gfn); + struct page *page = virt_to_page(private_spt); + u64 err, entry, level_state; + + err = tdh_mem_sept_add(&to_kvm_tdx(kvm)->td, gpa, tdx_level, page, &entry, + &level_state); + if (unlikely(tdx_operand_busy(err))) + return -EBUSY; + + if (KVM_BUG_ON(err, kvm)) { + pr_tdx_error_2(TDH_MEM_SEPT_ADD, err, entry, level_state); + return -EIO; + } + + return 0; +} + +static int tdx_sept_zap_private_spte(struct kvm *kvm, gfn_t gfn, + enum pg_level level) +{ + int tdx_level = pg_level_to_tdx_sept_level(level); + struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm); + gpa_t gpa = gfn_to_gpa(gfn) & KVM_HPAGE_MASK(level); + u64 err, entry, level_state; + + /* For now large page isn't supported yet. */ + WARN_ON_ONCE(level != PG_LEVEL_4K); + + err = tdh_mem_range_block(&kvm_tdx->td, gpa, tdx_level, &entry, &level_state); + if (unlikely(tdx_operand_busy(err))) + return -EBUSY; + if (KVM_BUG_ON(err, kvm)) { + pr_tdx_error_2(TDH_MEM_RANGE_BLOCK, err, entry, level_state); + return -EIO; + } + return 0; +} + /* * Ensure shared and private EPTs to be flushed on all vCPUs. * tdh_mem_track() is the only caller that increases TD epoch. An increase in @@ -544,7 +704,7 @@ void tdx_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa, int pgd_level) * occurs certainly after TD epoch increment and before the next * tdh_mem_track(). */ -static void __always_unused tdx_track(struct kvm *kvm) +static void tdx_track(struct kvm *kvm) { struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm); u64 err; @@ -557,7 +717,7 @@ static void __always_unused tdx_track(struct kvm *kvm) do { err = tdh_mem_track(&kvm_tdx->td); - } while (unlikely((err & TDX_SEAMCALL_STATUS_MASK) == TDX_OPERAND_BUSY)); + } while (unlikely(tdx_operand_busy(err))); if (KVM_BUG_ON(err, kvm)) pr_tdx_error(TDH_MEM_TRACK, err); @@ -565,6 +725,55 @@ static void __always_unused tdx_track(struct kvm *kvm) kvm_make_all_cpus_request(kvm, KVM_REQ_OUTSIDE_GUEST_MODE); } +int tdx_sept_free_private_spt(struct kvm *kvm, gfn_t gfn, + enum pg_level level, void *private_spt) +{ + struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm); + + /* + * free_external_spt() is only called after hkid is freed when TD is + * tearing down. + * KVM doesn't (yet) zap page table pages in mirror page table while + * TD is active, though guest pages mapped in mirror page table could be + * zapped during TD is active, e.g. for shared <-> private conversion + * and slot move/deletion. + */ + if (KVM_BUG_ON(is_hkid_assigned(kvm_tdx), kvm)) + return -EINVAL; + + /* + * The HKID assigned to this TD was already freed and cache was + * already flushed. We don't have to flush again. + */ + return tdx_reclaim_page(virt_to_page(private_spt)); +} + +int tdx_sept_remove_private_spte(struct kvm *kvm, gfn_t gfn, + enum pg_level level, kvm_pfn_t pfn) +{ + int ret; + + /* + * HKID is released after all private pages have been removed, and set + * before any might be populated. Warn if zapping is attempted when + * there can't be anything populated in the private EPT. + */ + if (KVM_BUG_ON(!is_hkid_assigned(to_kvm_tdx(kvm)), kvm)) + return -EINVAL; + + ret = tdx_sept_zap_private_spte(kvm, gfn, level); + if (ret) + return ret; + + /* + * TDX requires TLB tracking before dropping private page. Do + * it here, although it is also done later. + */ + tdx_track(kvm); + + return tdx_sept_drop_private_spte(kvm, gfn, level, pfn_to_page(pfn)); +} + static int tdx_get_capabilities(struct kvm_tdx_cmd *cmd) { const struct tdx_sys_info_td_conf *td_conf = &tdx_sysinfo->td_conf; diff --git a/arch/x86/kvm/vmx/tdx_arch.h b/arch/x86/kvm/vmx/tdx_arch.h index b6fe458a4224..a8071409498f 100644 --- a/arch/x86/kvm/vmx/tdx_arch.h +++ b/arch/x86/kvm/vmx/tdx_arch.h @@ -120,6 +120,29 @@ struct td_params { #define TDX_MIN_TSC_FREQUENCY_KHZ (100 * 1000) #define TDX_MAX_TSC_FREQUENCY_KHZ (10 * 1000 * 1000) +/* Additional Secure EPT entry information */ +#define TDX_SEPT_LEVEL_MASK GENMASK_ULL(2, 0) +#define TDX_SEPT_STATE_MASK GENMASK_ULL(15, 8) +#define TDX_SEPT_STATE_SHIFT 8 + +enum tdx_sept_entry_state { + TDX_SEPT_FREE = 0, + TDX_SEPT_BLOCKED = 1, + TDX_SEPT_PENDING = 2, + TDX_SEPT_PENDING_BLOCKED = 3, + TDX_SEPT_PRESENT = 4, +}; + +static inline u8 tdx_get_sept_level(u64 sept_entry_info) +{ + return sept_entry_info & TDX_SEPT_LEVEL_MASK; +} + +static inline u8 tdx_get_sept_state(u64 sept_entry_info) +{ + return (sept_entry_info & TDX_SEPT_STATE_MASK) >> TDX_SEPT_STATE_SHIFT; +} + #define MD_FIELD_ID_FEATURES0_TOPOLOGY_ENUM BIT_ULL(20) /* diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h index 584db5fa8dab..6344548d6a7a 100644 --- a/arch/x86/kvm/vmx/x86_ops.h +++ b/arch/x86/kvm/vmx/x86_ops.h @@ -132,6 +132,15 @@ void tdx_vcpu_free(struct kvm_vcpu *vcpu); int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp); +int tdx_sept_link_private_spt(struct kvm *kvm, gfn_t gfn, + enum pg_level level, void *private_spt); +int tdx_sept_free_private_spt(struct kvm *kvm, gfn_t gfn, + enum pg_level level, void *private_spt); +int tdx_sept_set_private_spte(struct kvm *kvm, gfn_t gfn, + enum pg_level level, kvm_pfn_t pfn); +int tdx_sept_remove_private_spte(struct kvm *kvm, gfn_t gfn, + enum pg_level level, kvm_pfn_t pfn); + void tdx_flush_tlb_current(struct kvm_vcpu *vcpu); void tdx_flush_tlb_all(struct kvm_vcpu *vcpu); void tdx_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa, int root_level); @@ -146,6 +155,34 @@ static inline void tdx_vcpu_free(struct kvm_vcpu *vcpu) {} static inline int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp) { return -EOPNOTSUPP; } +static inline int tdx_sept_link_private_spt(struct kvm *kvm, gfn_t gfn, + enum pg_level level, + void *private_spt) +{ + return -EOPNOTSUPP; +} + +static inline int tdx_sept_free_private_spt(struct kvm *kvm, gfn_t gfn, + enum pg_level level, + void *private_spt) +{ + return -EOPNOTSUPP; +} + +static inline int tdx_sept_set_private_spte(struct kvm *kvm, gfn_t gfn, + enum pg_level level, + kvm_pfn_t pfn) +{ + return -EOPNOTSUPP; +} + +static inline int tdx_sept_remove_private_spte(struct kvm *kvm, gfn_t gfn, + enum pg_level level, + kvm_pfn_t pfn) +{ + return -EOPNOTSUPP; +} + static inline void tdx_flush_tlb_current(struct kvm_vcpu *vcpu) {} static inline void tdx_flush_tlb_all(struct kvm_vcpu *vcpu) {} static inline void tdx_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa, int root_level) {} From patchwork Wed Feb 26 19:55:19 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Bonzini X-Patchwork-Id: 13993114 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4A9F6271263 for ; Wed, 26 Feb 2025 19:56:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740599767; cv=none; b=I2KOeUEL4EIswDcXqX5AemCdNzLQsKiP1+DE5BJ4PvvwNIObbYr1AafUjVfOXkVFGLmZ+in8jbMi5vdl+ueg2NnB9R5oL73GUu7NDy+RkRZyKbkMDqb0rQS3uCpOg3gJxKhpiBpFNxpkI6i3c0sUL+Xi+vXWSRjfAm6yz3r2k4g= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740599767; c=relaxed/simple; bh=MRIcvvZu50UqZHlJeD8KmjVBcxfvNXCeNneNgqhCYDY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=J4j0ltn8ODZWJdTtVUBqpMpY5+06AvexjAtHHF6RZpbNi9t0+z6azd/9m/dN+ihrES12CMK0RD0c5CctEYZyYtqhwtjGQj+FdhYufRqO7zJHpq9eVi3ekYnVyZ/QVSx02vqlswYjZNmRf5hu5B/OpILc7Z38AFFHjLdm9AGbHDs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=Xaw31XYp; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="Xaw31XYp" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1740599764; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=mCT8nQrqho4LOzPwdnprk6vXgxNUeptPbOQTYacmXLY=; b=Xaw31XYpmznB7TfGaP7oDdyiMQFdP1l4guLR/KDXo77er4UF9a5TM2NKyqKd70Mz1tqgH4 0I0jcfU2pUKP/FINrLrOBJxM9yDxCBBsyh1MLVUeAfIEfzwlbCSwTbYulKCQGw2OzG+ZgT ROJ9DuXFscrPcade+Hx2hi7YMsVXGng= Received: from mx-prod-mc-04.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-390-ltKZxCxSN46TOJDUAhB3ig-1; Wed, 26 Feb 2025 14:55:58 -0500 X-MC-Unique: ltKZxCxSN46TOJDUAhB3ig-1 X-Mimecast-MFC-AGG-ID: ltKZxCxSN46TOJDUAhB3ig_1740599756 Received: from mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-04.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 9F7791903080; Wed, 26 Feb 2025 19:55:56 +0000 (UTC) Received: from virtlab1023.lab.eng.rdu2.redhat.com (virtlab1023.lab.eng.rdu2.redhat.com [10.8.1.187]) by mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 8CDA6300018D; Wed, 26 Feb 2025 19:55:55 +0000 (UTC) From: Paolo Bonzini To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: seanjc@google.com, Yan Zhao , Rick Edgecombe , Isaku Yamahata Subject: [PATCH 19/29] KVM: TDX: Implement hook to get max mapping level of private pages Date: Wed, 26 Feb 2025 14:55:19 -0500 Message-ID: <20250226195529.2314580-20-pbonzini@redhat.com> In-Reply-To: <20250226195529.2314580-1-pbonzini@redhat.com> References: <20250226195529.2314580-1-pbonzini@redhat.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.4 From: Isaku Yamahata Implement hook private_max_mapping_level for TDX to let TDP MMU core get max mapping level of private pages. The value is hard coded to 4K for no huge page support for now. Signed-off-by: Isaku Yamahata Co-developed-by: Rick Edgecombe Signed-off-by: Rick Edgecombe Co-developed-by: Yan Zhao Signed-off-by: Yan Zhao Reviewed-by: Paolo Bonzini Message-ID: <20241112073816.22256-1-yan.y.zhao@intel.com> Signed-off-by: Paolo Bonzini --- arch/x86/kvm/vmx/main.c | 10 ++++++++++ arch/x86/kvm/vmx/tdx.c | 5 +++++ arch/x86/kvm/vmx/x86_ops.h | 2 ++ 3 files changed, 17 insertions(+) diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c index 0c94810b1f48..828168e67d4e 100644 --- a/arch/x86/kvm/vmx/main.c +++ b/arch/x86/kvm/vmx/main.c @@ -174,6 +174,14 @@ static int vt_vcpu_mem_enc_ioctl(struct kvm_vcpu *vcpu, void __user *argp) return tdx_vcpu_ioctl(vcpu, argp); } +static int vt_gmem_private_max_mapping_level(struct kvm *kvm, kvm_pfn_t pfn) +{ + if (is_td(kvm)) + return tdx_gmem_private_max_mapping_level(kvm, pfn); + + return 0; +} + #define VMX_REQUIRED_APICV_INHIBITS \ (BIT(APICV_INHIBIT_REASON_DISABLED) | \ BIT(APICV_INHIBIT_REASON_ABSENT) | \ @@ -331,6 +339,8 @@ struct kvm_x86_ops vt_x86_ops __initdata = { .mem_enc_ioctl = vt_mem_enc_ioctl, .vcpu_mem_enc_ioctl = vt_vcpu_mem_enc_ioctl, + + .private_max_mapping_level = vt_gmem_private_max_mapping_level }; struct kvm_x86_init_ops vt_init_ops __initdata = { diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index 5f38c325dfa6..989db4887963 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -1630,6 +1630,11 @@ int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp) return ret; } +int tdx_gmem_private_max_mapping_level(struct kvm *kvm, kvm_pfn_t pfn) +{ + return PG_LEVEL_4K; +} + static int tdx_online_cpu(unsigned int cpu) { unsigned long flags; diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h index 6344548d6a7a..bc3cf1c1da37 100644 --- a/arch/x86/kvm/vmx/x86_ops.h +++ b/arch/x86/kvm/vmx/x86_ops.h @@ -144,6 +144,7 @@ int tdx_sept_remove_private_spte(struct kvm *kvm, gfn_t gfn, void tdx_flush_tlb_current(struct kvm_vcpu *vcpu); void tdx_flush_tlb_all(struct kvm_vcpu *vcpu); void tdx_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa, int root_level); +int tdx_gmem_private_max_mapping_level(struct kvm *kvm, kvm_pfn_t pfn); #else static inline int tdx_vm_init(struct kvm *kvm) { return -EOPNOTSUPP; } static inline void tdx_mmu_release_hkid(struct kvm *kvm) {} @@ -186,6 +187,7 @@ static inline int tdx_sept_remove_private_spte(struct kvm *kvm, gfn_t gfn, static inline void tdx_flush_tlb_current(struct kvm_vcpu *vcpu) {} static inline void tdx_flush_tlb_all(struct kvm_vcpu *vcpu) {} static inline void tdx_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa, int root_level) {} +static inline int tdx_gmem_private_max_mapping_level(struct kvm *kvm, kvm_pfn_t pfn) { return 0; } #endif #endif /* __KVM_X86_VMX_X86_OPS_H */ From patchwork Wed Feb 26 19:55:20 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Bonzini X-Patchwork-Id: 13993116 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B225A27182B for ; Wed, 26 Feb 2025 19:56:06 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740599768; cv=none; b=tBbMwhDDhN29MWml0T2uZ3VOdEos4xu3f3KHdDI/blpsJAJX3SePvDPnpU/j5YUKBWIhKHUZhvidMVDkTrMYGItUX1ULrcAvJky4soEt2JEy1guGqhTRCjLPSJRqxfVhJ1EEojLtUnPJMEYArDpc9Xu7D5eYzNfbBta9K9poteE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740599768; c=relaxed/simple; bh=/XPldn7ZgVhJV/uqtLlh76ZEFcT9lBeODgTAYP8GdHI=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=umpchcm6P4Ck3km6U2/bQzkwzmnZOCGPfQcF40IlXuxbje4VSlW37riqC2+l1ZfXzpyYpenI6RRXlN1rpp8SwGJruTTur0PqfdWBVAEFdEGTzgpi3JFPWiJYNWNyiPQ7Zz52XRzF1TTDwQxCK3V6fADCMcfyJ1YTTV7pgimBB6U= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=QFKCAl9z; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="QFKCAl9z" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1740599765; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=6zLlWivfIr+fpiRMtVJ1u/lyBL3DWGusbfiE0ScUkL0=; b=QFKCAl9zXdLFycFDP2C9OXxhcM/qUPmECEOxsekitNRUtU76VfHawHXDAmlKSb29rQTUFd q+B121vJIY8e+xYi6cZgANArf7qMt8HDELLyuH353Ob5MNu/54Jda7Q6RJbfHRC5vCSYc3 oH9uLYMlMCQyCUfMzTXox/GEZnyTy1U= Received: from mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-467-SYtEzonXN-WZqbmMBdPqRQ-1; Wed, 26 Feb 2025 14:55:59 -0500 X-MC-Unique: SYtEzonXN-WZqbmMBdPqRQ-1 X-Mimecast-MFC-AGG-ID: SYtEzonXN-WZqbmMBdPqRQ_1740599758 Received: from mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.15]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 6BF64180056F; Wed, 26 Feb 2025 19:55:58 +0000 (UTC) Received: from virtlab1023.lab.eng.rdu2.redhat.com (virtlab1023.lab.eng.rdu2.redhat.com [10.8.1.187]) by mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 6E05D19560AE; Wed, 26 Feb 2025 19:55:57 +0000 (UTC) From: Paolo Bonzini To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: seanjc@google.com, Yan Zhao , Rick Edgecombe Subject: [PATCH 20/29] KVM: x86/mmu: Bail out kvm_tdp_map_page() when VM dead Date: Wed, 26 Feb 2025 14:55:20 -0500 Message-ID: <20250226195529.2314580-21-pbonzini@redhat.com> In-Reply-To: <20250226195529.2314580-1-pbonzini@redhat.com> References: <20250226195529.2314580-1-pbonzini@redhat.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.0 on 10.30.177.15 From: Yan Zhao Bail out of the loop in kvm_tdp_map_page() when a VM is dead. Otherwise, kvm_tdp_map_page() may get stuck in the kernel loop when there's only one vCPU in the VM (or if the other vCPUs are not executing ioctls), even if fatal errors have occurred. kvm_tdp_map_page() is called by the ioctl KVM_PRE_FAULT_MEMORY or the TDX ioctl KVM_TDX_INIT_MEM_REGION. It loops in the kernel whenever RET_PF_RETRY is returned. In the TDP MMU, kvm_tdp_mmu_map() always returns RET_PF_RETRY, regardless of the specific error code from tdp_mmu_set_spte_atomic(), tdp_mmu_link_sp(), or tdp_mmu_split_huge_page(). While this is acceptable in general cases where the only possible error code from these functions is -EBUSY, TDX introduces an additional error code, -EIO, due to SEAMCALL errors. Since this -EIO error is also a fatal error, check for VM dead in the kvm_tdp_map_page() to avoid unnecessary retries until a signal is pending. The error -EIO is uncommon and has not been observed in real workloads. Currently, it is only hypothetically triggered by bypassing the real SEAMCALL and faking an error in the SEAMCALL wrapper. Signed-off-by: Yan Zhao Message-ID: <20250220102728.24546-1-yan.y.zhao@intel.com> Signed-off-by: Paolo Bonzini --- arch/x86/kvm/mmu/mmu.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 48a7e6f32f7f..f4e18b33a21a 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -4700,6 +4700,10 @@ int kvm_tdp_map_page(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code, u8 *level do { if (signal_pending(current)) return -EINTR; + + if (kvm_check_request(KVM_REQ_VM_DEAD, vcpu)) + return -EIO; + cond_resched(); r = kvm_mmu_do_page_fault(vcpu, gpa, error_code, true, NULL, level); } while (r == RET_PF_RETRY); From patchwork Wed Feb 26 19:55:21 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Bonzini X-Patchwork-Id: 13993115 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 10423271826 for ; Wed, 26 Feb 2025 19:56:05 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740599767; cv=none; b=XDjDdVSxAio1TeHXL2lUdk8KV0VebP51RNqmb9VWjJKSgceRZhqrZ5kt8f7g9DSinHDwU5YVKDP4Fad5LJcRX7ti9pFNyBQORgwz5GzOUVgEsbklelY3jthgqlZMuPzm0gLt5ExOorDZORaUU3EogOHdtFwf07rRXN5g+TceeYM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740599767; c=relaxed/simple; bh=x9+CRJONKRTnB7NtIzMLFEu/CRw2C1cGKXE2qZ2iymw=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=Uw+BCUfNTQM204yWLEiqeaiV7FQjQmGfyfJT35bgBwsO4QEHQnZjnm0Ky6W1TYkeNvyyDHO3nUr1F5R1JVgbkkbIOe5mhcxrO/ggk6PBrFEUUS0HlDS2Wq5ZDiRLaDm14HiYKGhPo6H+ZdcYvTcuV4GPS3jPJKsPnFBHBaCIBlA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=ZEAc/Q6a; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="ZEAc/Q6a" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1740599764; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=69N2ySQEB9r7pxaQeeuPr6qhpNtr/uJzO4TGLmLzoMI=; b=ZEAc/Q6aJu2uFwm8tL9NG8X3+UB6Ia0RkZdfhd26Zy3EHn71ZnSWuaC63vEIhqR2//iaMP Ni1HARsisPtrEOUa9MWWH1W3CiB4uSTB3crs55NwU5iCaWp/Id+XutUlAucKGYK1xxUwEb ooRKw87RGel+4aeHZrIaNjG2TduoLpw= Received: from mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-207-19nAfOdjOc6rRNoWfyNh1A-1; Wed, 26 Feb 2025 14:56:00 -0500 X-MC-Unique: 19nAfOdjOc6rRNoWfyNh1A-1 X-Mimecast-MFC-AGG-ID: 19nAfOdjOc6rRNoWfyNh1A_1740599759 Received: from mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.15]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 666021800874; Wed, 26 Feb 2025 19:55:59 +0000 (UTC) Received: from virtlab1023.lab.eng.rdu2.redhat.com (virtlab1023.lab.eng.rdu2.redhat.com [10.8.1.187]) by mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 7FAE419560AE; Wed, 26 Feb 2025 19:55:58 +0000 (UTC) From: Paolo Bonzini To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: seanjc@google.com, Yan Zhao , Rick Edgecombe Subject: [PATCH 21/29] KVM: x86/mmu: Export kvm_tdp_map_page() Date: Wed, 26 Feb 2025 14:55:21 -0500 Message-ID: <20250226195529.2314580-22-pbonzini@redhat.com> In-Reply-To: <20250226195529.2314580-1-pbonzini@redhat.com> References: <20250226195529.2314580-1-pbonzini@redhat.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.0 on 10.30.177.15 From: Rick Edgecombe In future changes coco specific code will need to call kvm_tdp_map_page() from within their respective gmem_post_populate() callbacks. Export it so this can be done from vendor specific code. Since kvm_mmu_reload() will be needed for this operation, export its callee kvm_mmu_load() as well. Signed-off-by: Rick Edgecombe Signed-off-by: Yan Zhao Message-ID: <20241112073827.22270-1-yan.y.zhao@intel.com> Signed-off-by: Paolo Bonzini --- arch/x86/kvm/mmu/mmu.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index f4e18b33a21a..443c4836bc62 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -4728,6 +4728,7 @@ int kvm_tdp_map_page(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code, u8 *level return -EIO; } } +EXPORT_SYMBOL_GPL(kvm_tdp_map_page); long kvm_arch_vcpu_pre_fault_memory(struct kvm_vcpu *vcpu, struct kvm_pre_fault_memory *range) @@ -5751,6 +5752,7 @@ int kvm_mmu_load(struct kvm_vcpu *vcpu) out: return r; } +EXPORT_SYMBOL_GPL(kvm_mmu_load); void kvm_mmu_unload(struct kvm_vcpu *vcpu) { From patchwork Wed Feb 26 19:55:22 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Bonzini X-Patchwork-Id: 13993118 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8E05127290D for ; Wed, 26 Feb 2025 19:56:07 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740599770; cv=none; b=IqxvsLntPU356hqG9JQCV9JY2XC0ZH93Zwt04S0z2P61UWou1xozVeQN5wWeqxiS5y2Dy2VmpfuP3e8hkv62Z/Fut4EC3SBwSBMuXHF/gY/uYElYOggRgVKiDnv72jAsMOxHziNT/wmqcwu8ZbNSxRdXyqA1H1i94UL+i+Zghow= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740599770; c=relaxed/simple; bh=hi09F/q6dQPU3N15Qmxhs0cAa0R2AlpWYrnw+O/UxG8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=IriqCNXDGpF5AtSOCyZJYE2mk0nIBsdw8NaSxkeF3haMhtGnqzqO5XCAoI6fXSUyYcV07Wr3nL5lKDTL8GpHmjqgrnWmjK3zGu0baYV8n8UjHoCXMHGqmcC0HCyV7ONMznBMHAVeCSd0bgqMB6cXhXoGoZcJVX/sDwYRavRapbs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=bN6B+ij8; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="bN6B+ij8" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1740599766; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=4zHNaGhOLIKF1RINghd//1N9mcPIA2fRtWEmdegcDMo=; b=bN6B+ij8QZL50Q14KL3U56Akc75boYyNVHZd46SJvknhIyfC2ZPtJYQ2SXJ4i8qnsalqOL JOEB9fGX2Nu3PNUsu2Qapnn/xkGsVWIVrzOHHWZWbNJ+nvE3UP5d8MmGqmpBSpED9manu2 6rGrnpGsExjX56HowYmIS16H4lGC/rI= Received: from mx-prod-mc-02.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-487-SAvYt6WpNni28lDZfMDuYg-1; Wed, 26 Feb 2025 14:56:02 -0500 X-MC-Unique: SAvYt6WpNni28lDZfMDuYg-1 X-Mimecast-MFC-AGG-ID: SAvYt6WpNni28lDZfMDuYg_1740599760 Received: from mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.15]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-02.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 95A5B18EAA9A; Wed, 26 Feb 2025 19:56:00 +0000 (UTC) Received: from virtlab1023.lab.eng.rdu2.redhat.com (virtlab1023.lab.eng.rdu2.redhat.com [10.8.1.187]) by mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 9985C19560AE; Wed, 26 Feb 2025 19:55:59 +0000 (UTC) From: Paolo Bonzini To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: seanjc@google.com, Yan Zhao , Rick Edgecombe , Isaku Yamahata Subject: [PATCH 22/29] KVM: TDX: Add an ioctl to create initial guest memory Date: Wed, 26 Feb 2025 14:55:22 -0500 Message-ID: <20250226195529.2314580-23-pbonzini@redhat.com> In-Reply-To: <20250226195529.2314580-1-pbonzini@redhat.com> References: <20250226195529.2314580-1-pbonzini@redhat.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.0 on 10.30.177.15 From: Isaku Yamahata Add a new ioctl for the user space VMM to initialize guest memory with the specified memory contents. Because TDX protects the guest's memory, the creation of the initial guest memory requires a dedicated TDX module API, TDH.MEM.PAGE.ADD(), instead of directly copying the memory contents into the guest's memory in the case of the default VM type. Define a new subcommand, KVM_TDX_INIT_MEM_REGION, of vCPU-scoped KVM_MEMORY_ENCRYPT_OP. Check if the GFN is already pre-allocated, assign the guest page in Secure-EPT, copy the initial memory contents into the guest memory, and encrypt the guest memory. Optionally, extend the memory measurement of the TDX guest. The ioctl uses the vCPU file descriptor because of the TDX module's requirement that the memory is added to the S-EPT (via TDH.MEM.SEPT.ADD) prior to initialization (TDH.MEM.PAGE.ADD). Accessing the MMU in turn requires a vCPU file descriptor, just like for KVM_PRE_FAULT_MEMORY. In fact, the post-populate callback is able to reuse the same logic used by KVM_PRE_FAULT_MEMORY, so that userspace can do everything with a single ioctl. Note that this is the only way to invoke TDH.MEM.SEPT.ADD before the TD in finalized, as userspace cannot use KVM_PRE_FAULT_MEMORY at that point. This ensures that there cannot be pages in the S-EPT awaiting TDH.MEM.PAGE.ADD, which would be treated incorrectly as spurious by tdp_mmu_map_handle_target_level() (KVM would see the SPTE as PRESENT, but the corresponding S-EPT entry will be !PRESENT). Suggested-by: Sean Christopherson Signed-off-by: Isaku Yamahata Co-developed-by: Rick Edgecombe Signed-off-by: Rick Edgecombe Co-developed-by: Yan Zhao Signed-off-by: Yan Zhao --- - KVM_BUG_ON() for kvm_tdx->nr_premapped (Paolo) - Use tdx_operand_busy() - Merge first patch in SEPT SEAMCALL retry series in to this base (Paolo) Signed-off-by: Paolo Bonzini --- arch/x86/include/uapi/asm/kvm.h | 9 ++ arch/x86/kvm/vmx/tdx.c | 144 ++++++++++++++++++++++++++++++++ virt/kvm/kvm_main.c | 1 + 3 files changed, 154 insertions(+) diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h index cd55484e3f0c..ee1b517351a3 100644 --- a/arch/x86/include/uapi/asm/kvm.h +++ b/arch/x86/include/uapi/asm/kvm.h @@ -932,6 +932,7 @@ enum kvm_tdx_cmd_id { KVM_TDX_CAPABILITIES = 0, KVM_TDX_INIT_VM, KVM_TDX_INIT_VCPU, + KVM_TDX_INIT_MEM_REGION, KVM_TDX_GET_CPUID, KVM_TDX_CMD_NR_MAX, @@ -987,4 +988,12 @@ struct kvm_tdx_init_vm { struct kvm_cpuid2 cpuid; }; +#define KVM_TDX_MEASURE_MEMORY_REGION _BITULL(0) + +struct kvm_tdx_init_mem_region { + __u64 source_addr; + __u64 gpa; + __u64 nr_pages; +}; + #endif /* _ASM_X86_KVM_H */ diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index 989db4887963..12f3433d062d 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -1,4 +1,5 @@ // SPDX-License-Identifier: GPL-2.0 +#include #include #include #include @@ -10,6 +11,7 @@ #include "tdx.h" #include "vmx.h" #include "mmu/spte.h" +#include "common.h" #pragma GCC poison to_vmx @@ -1600,6 +1602,145 @@ static int tdx_vcpu_init(struct kvm_vcpu *vcpu, struct kvm_tdx_cmd *cmd) return 0; } +struct tdx_gmem_post_populate_arg { + struct kvm_vcpu *vcpu; + __u32 flags; +}; + +static int tdx_gmem_post_populate(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn, + void __user *src, int order, void *_arg) +{ + u64 error_code = PFERR_GUEST_FINAL_MASK | PFERR_PRIVATE_ACCESS; + struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm); + struct tdx_gmem_post_populate_arg *arg = _arg; + struct kvm_vcpu *vcpu = arg->vcpu; + gpa_t gpa = gfn_to_gpa(gfn); + u8 level = PG_LEVEL_4K; + struct page *src_page; + int ret, i; + u64 err, entry, level_state; + + /* + * Get the source page if it has been faulted in. Return failure if the + * source page has been swapped out or unmapped in primary memory. + */ + ret = get_user_pages_fast((unsigned long)src, 1, 0, &src_page); + if (ret < 0) + return ret; + if (ret != 1) + return -ENOMEM; + + ret = kvm_tdp_map_page(vcpu, gpa, error_code, &level); + if (ret < 0) + goto out; + + /* + * The private mem cannot be zapped after kvm_tdp_map_page() + * because all paths are covered by slots_lock and the + * filemap invalidate lock. Check that they are indeed enough. + */ + if (IS_ENABLED(CONFIG_KVM_PROVE_MMU)) { + scoped_guard(read_lock, &kvm->mmu_lock) { + if (KVM_BUG_ON(!kvm_tdp_mmu_gpa_is_mapped(vcpu, gpa), kvm)) { + ret = -EIO; + goto out; + } + } + } + + ret = 0; + err = tdh_mem_page_add(&kvm_tdx->td, gpa, pfn_to_page(pfn), src_page, + &entry, &level_state); + if (err) { + ret = unlikely(tdx_operand_busy(err)) ? -EBUSY : -EIO; + goto out; + } + + if (arg->flags & KVM_TDX_MEASURE_MEMORY_REGION) { + for (i = 0; i < PAGE_SIZE; i += TDX_EXTENDMR_CHUNKSIZE) { + err = tdh_mr_extend(&kvm_tdx->td, gpa + i, &entry, + &level_state); + if (err) { + ret = -EIO; + break; + } + } + } + +out: + put_page(src_page); + return ret; +} + +static int tdx_vcpu_init_mem_region(struct kvm_vcpu *vcpu, struct kvm_tdx_cmd *cmd) +{ + struct vcpu_tdx *tdx = to_tdx(vcpu); + struct kvm *kvm = vcpu->kvm; + struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm); + struct kvm_tdx_init_mem_region region; + struct tdx_gmem_post_populate_arg arg; + long gmem_ret; + int ret; + + if (tdx->state != VCPU_TD_STATE_INITIALIZED) + return -EINVAL; + + guard(mutex)(&kvm->slots_lock); + + /* Once TD is finalized, the initial guest memory is fixed. */ + if (kvm_tdx->state == TD_STATE_RUNNABLE) + return -EINVAL; + + if (cmd->flags & ~KVM_TDX_MEASURE_MEMORY_REGION) + return -EINVAL; + + if (copy_from_user(®ion, u64_to_user_ptr(cmd->data), sizeof(region))) + return -EFAULT; + + if (!PAGE_ALIGNED(region.source_addr) || !PAGE_ALIGNED(region.gpa) || + !region.nr_pages || + region.gpa + (region.nr_pages << PAGE_SHIFT) <= region.gpa || + !vt_is_tdx_private_gpa(kvm, region.gpa) || + !vt_is_tdx_private_gpa(kvm, region.gpa + (region.nr_pages << PAGE_SHIFT) - 1)) + return -EINVAL; + + kvm_mmu_reload(vcpu); + ret = 0; + while (region.nr_pages) { + if (signal_pending(current)) { + ret = -EINTR; + break; + } + + arg = (struct tdx_gmem_post_populate_arg) { + .vcpu = vcpu, + .flags = cmd->flags, + }; + gmem_ret = kvm_gmem_populate(kvm, gpa_to_gfn(region.gpa), + u64_to_user_ptr(region.source_addr), + 1, tdx_gmem_post_populate, &arg); + if (gmem_ret < 0) { + ret = gmem_ret; + break; + } + + if (gmem_ret != 1) { + ret = -EIO; + break; + } + + region.source_addr += PAGE_SIZE; + region.gpa += PAGE_SIZE; + region.nr_pages--; + + cond_resched(); + } + + if (copy_to_user(u64_to_user_ptr(cmd->data), ®ion, sizeof(region))) + ret = -EFAULT; + return ret; +} + int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp) { struct kvm_tdx *kvm_tdx = to_kvm_tdx(vcpu->kvm); @@ -1619,6 +1760,9 @@ int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp) case KVM_TDX_INIT_VCPU: ret = tdx_vcpu_init(vcpu, &cmd); break; + case KVM_TDX_INIT_MEM_REGION: + ret = tdx_vcpu_init_mem_region(vcpu, &cmd); + break; case KVM_TDX_GET_CPUID: ret = tdx_vcpu_get_cpuid(vcpu, &cmd); break; diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 622b5a99078a..c88acfa154e6 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -2581,6 +2581,7 @@ struct kvm_memory_slot *kvm_vcpu_gfn_to_memslot(struct kvm_vcpu *vcpu, gfn_t gfn return NULL; } +EXPORT_SYMBOL_GPL(kvm_vcpu_gfn_to_memslot); bool kvm_is_visible_gfn(struct kvm *kvm, gfn_t gfn) { From patchwork Wed Feb 26 19:55:23 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Bonzini X-Patchwork-Id: 13993119 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2BA6D27FE7E for ; Wed, 26 Feb 2025 19:56:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740599772; cv=none; b=rB9L9Q1Ioe8Ybvo1RwDY6Jnxdas5W8v5ze4EAO4SZUGKamlEzfUlL7wUQlIpdle5yURZHyuhzhgqdBnmZie8ZEAe31Bob2pgDpbV/MP2xAa4wMiEdiqVkwo1ZgfqSRihcEJkGQ4IR5L76Fw+Zr/0BRExtrfBTJTDMrog5AuJkVc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740599772; c=relaxed/simple; bh=x5UR3ZJZV1BTI4efw7+gLC37purtPC18f8t3fVVx3lg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=EhTdw0KRhmQr1OwdNqcfFoZC5GHA8wfraZhrAHoW+h+3BuOyl/tXIciFeUXkH/s1RuuagQ68QZEXw9vKGWRpbwVTj4HgEZCyP/Ib/IxExNg7IJbd/pp8lKnG3yDcpZxS4QNfWsW68ddoTIziwhraCwRrNVlc7hP9C4XpDi/Zl6A= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=EUxoTvtB; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="EUxoTvtB" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1740599769; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=2CL12CTCw2aUrAAWYDMowe+UqQvHYSWHA6008OECzMA=; b=EUxoTvtBOMVulfL3sCzsPMHYBa4C4pVXGaS2RS7jPl44KAZhHOJSLQnKXGMXIgaPi9CCgO 6YYEnjUt2Kh22GzUCufWyK/yMguFROxg/QYkEzyE+dmAQ59RKXiFAmVwfdG7d39rC0IpLb bDLxFx4mblc5QM2FuQAMMIzhhPQh5r0= Received: from mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-384-N0vYDjylMLqE6aOmtGZ1LA-1; Wed, 26 Feb 2025 14:56:04 -0500 X-MC-Unique: N0vYDjylMLqE6aOmtGZ1LA-1 X-Mimecast-MFC-AGG-ID: N0vYDjylMLqE6aOmtGZ1LA_1740599762 Received: from mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.15]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 3E94B1905626; Wed, 26 Feb 2025 19:56:02 +0000 (UTC) Received: from virtlab1023.lab.eng.rdu2.redhat.com (virtlab1023.lab.eng.rdu2.redhat.com [10.8.1.187]) by mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id C765119560AE; Wed, 26 Feb 2025 19:56:00 +0000 (UTC) From: Paolo Bonzini To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: seanjc@google.com, Yan Zhao , Rick Edgecombe , Isaku Yamahata , Adrian Hunter Subject: [PATCH 23/29] KVM: TDX: Finalize VM initialization Date: Wed, 26 Feb 2025 14:55:23 -0500 Message-ID: <20250226195529.2314580-24-pbonzini@redhat.com> In-Reply-To: <20250226195529.2314580-1-pbonzini@redhat.com> References: <20250226195529.2314580-1-pbonzini@redhat.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.0 on 10.30.177.15 From: Isaku Yamahata Add a new VM-scoped KVM_MEMORY_ENCRYPT_OP IOCTL subcommand, KVM_TDX_FINALIZE_VM, to perform TD Measurement Finalization. Documentation for the API is added in another patch: "Documentation/virt/kvm: Document on Trust Domain Extensions(TDX)" For the purpose of attestation, a measurement must be made of the TDX VM initial state. This is referred to as TD Measurement Finalization, and uses SEAMCALL TDH.MR.FINALIZE, after which: 1. The VMM adding TD private pages with arbitrary content is no longer allowed 2. The TDX VM is runnable Co-developed-by: Adrian Hunter Signed-off-by: Adrian Hunter Signed-off-by: Isaku Yamahata Signed-off-by: Rick Edgecombe Message-ID: <20240904030751.117579-21-rick.p.edgecombe@intel.com> Signed-off-by: Paolo Bonzini --- arch/x86/include/uapi/asm/kvm.h | 1 + arch/x86/kvm/vmx/tdx.c | 78 +++++++++++++++++++++++++++++---- arch/x86/kvm/vmx/tdx.h | 3 ++ 3 files changed, 74 insertions(+), 8 deletions(-) diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h index ee1b517351a3..89cc7a18ef45 100644 --- a/arch/x86/include/uapi/asm/kvm.h +++ b/arch/x86/include/uapi/asm/kvm.h @@ -933,6 +933,7 @@ enum kvm_tdx_cmd_id { KVM_TDX_INIT_VM, KVM_TDX_INIT_VCPU, KVM_TDX_INIT_MEM_REGION, + KVM_TDX_FINALIZE_VM, KVM_TDX_GET_CPUID, KVM_TDX_CMD_NR_MAX, diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index 12f3433d062d..246f924ae6f6 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -557,6 +557,29 @@ static int tdx_mem_page_aug(struct kvm *kvm, gfn_t gfn, return 0; } +/* + * KVM_TDX_INIT_MEM_REGION calls kvm_gmem_populate() to map guest pages; the + * callback tdx_gmem_post_populate() then maps pages into private memory. + * through the a seamcall TDH.MEM.PAGE.ADD(). The SEAMCALL also requires the + * private EPT structures for the page to have been built before, which is + * done via kvm_tdp_map_page(). nr_premapped counts the number of pages that + * were added to the EPT structures but not added with TDH.MEM.PAGE.ADD(). + * The counter has to be zero on KVM_TDX_FINALIZE_VM, to ensure that there + * are no half-initialized shared EPT pages. + */ +static int tdx_mem_page_record_premap_cnt(struct kvm *kvm, gfn_t gfn, + enum pg_level level, kvm_pfn_t pfn) +{ + struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm); + + if (KVM_BUG_ON(kvm->arch.pre_fault_allowed, kvm)) + return -EINVAL; + + /* nr_premapped will be decreased when tdh_mem_page_add() is called. */ + atomic64_inc(&kvm_tdx->nr_premapped); + return 0; +} + int tdx_sept_set_private_spte(struct kvm *kvm, gfn_t gfn, enum pg_level level, kvm_pfn_t pfn) { @@ -577,14 +600,15 @@ int tdx_sept_set_private_spte(struct kvm *kvm, gfn_t gfn, */ get_page(page); + /* + * Read 'pre_fault_allowed' before 'kvm_tdx->state'; see matching + * barrier in tdx_td_finalize(). + */ + smp_rmb(); if (likely(kvm_tdx->state == TD_STATE_RUNNABLE)) return tdx_mem_page_aug(kvm, gfn, level, page); - /* - * TODO: KVM_TDX_INIT_MEM_REGION support to populate before finalize - * comes here for the initial memory. - */ - return -EOPNOTSUPP; + return tdx_mem_page_record_premap_cnt(kvm, gfn, level, pfn); } static int tdx_sept_drop_private_spte(struct kvm *kvm, gfn_t gfn, @@ -615,10 +639,12 @@ static int tdx_sept_drop_private_spte(struct kvm *kvm, gfn_t gfn, if (unlikely(kvm_tdx->state != TD_STATE_RUNNABLE && err == (TDX_EPT_WALK_FAILED | TDX_OPERAND_ID_RCX))) { /* - * This page was mapped with KVM_MAP_MEMORY, but - * KVM_TDX_INIT_MEM_REGION is not issued yet. + * Page is mapped by KVM_TDX_INIT_MEM_REGION, but hasn't called + * tdh_mem_page_add(). */ - if (!is_last_spte(entry, level) || !(entry & VMX_EPT_RWX_MASK)) { + if ((!is_last_spte(entry, level) || !(entry & VMX_EPT_RWX_MASK)) && + !KVM_BUG_ON(!atomic64_read(&kvm_tdx->nr_premapped), kvm)) { + atomic64_dec(&kvm_tdx->nr_premapped); tdx_unpin(kvm, page); return 0; } @@ -1365,6 +1391,36 @@ void tdx_flush_tlb_all(struct kvm_vcpu *vcpu) ept_sync_global(); } +static int tdx_td_finalize(struct kvm *kvm, struct kvm_tdx_cmd *cmd) +{ + struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm); + + guard(mutex)(&kvm->slots_lock); + + if (!is_hkid_assigned(kvm_tdx) || kvm_tdx->state == TD_STATE_RUNNABLE) + return -EINVAL; + /* + * Pages are pending for KVM_TDX_INIT_MEM_REGION to issue + * TDH.MEM.PAGE.ADD(). + */ + if (atomic64_read(&kvm_tdx->nr_premapped)) + return -EINVAL; + + cmd->hw_error = tdh_mr_finalize(&kvm_tdx->td); + if (tdx_operand_busy(cmd->hw_error)) + return -EBUSY; + if (KVM_BUG_ON(cmd->hw_error, kvm)) { + pr_tdx_error(TDH_MR_FINALIZE, cmd->hw_error); + return -EIO; + } + + kvm_tdx->state = TD_STATE_RUNNABLE; + /* TD_STATE_RUNNABLE must be set before 'pre_fault_allowed' */ + smp_wmb(); + kvm->arch.pre_fault_allowed = true; + return 0; +} + int tdx_vm_ioctl(struct kvm *kvm, void __user *argp) { struct kvm_tdx_cmd tdx_cmd; @@ -1389,6 +1445,9 @@ int tdx_vm_ioctl(struct kvm *kvm, void __user *argp) case KVM_TDX_INIT_VM: r = tdx_td_init(kvm, &tdx_cmd); break; + case KVM_TDX_FINALIZE_VM: + r = tdx_td_finalize(kvm, &tdx_cmd); + break; default: r = -EINVAL; goto out; @@ -1656,6 +1715,9 @@ static int tdx_gmem_post_populate(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn, goto out; } + if (!KVM_BUG_ON(!atomic64_read(&kvm_tdx->nr_premapped), kvm)) + atomic64_dec(&kvm_tdx->nr_premapped); + if (arg->flags & KVM_TDX_MEASURE_MEMORY_REGION) { for (i = 0; i < PAGE_SIZE; i += TDX_EXTENDMR_CHUNKSIZE) { err = tdh_mr_extend(&kvm_tdx->td, gpa + i, &entry, diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h index 77d693dfcb08..b71f8e9643e4 100644 --- a/arch/x86/kvm/vmx/tdx.h +++ b/arch/x86/kvm/vmx/tdx.h @@ -31,6 +31,9 @@ struct kvm_tdx { u64 tsc_offset; struct tdx_td td; + + /* For KVM_TDX_INIT_MEM_REGION. */ + atomic64_t nr_premapped; }; /* TDX module vCPU states */ From patchwork Wed Feb 26 19:55:24 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Bonzini X-Patchwork-Id: 13993117 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 20972271837 for ; Wed, 26 Feb 2025 19:56:06 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740599769; cv=none; b=LprOI6ouci34PAV/YQ8ungwGqvrO1aCEqcJt++jKl3Rd+yHdL0dMFqpqCfpkBugmO59PURC8aI+uRvaO70eby781YNuPqBt14mwtdSRzBxWK/PYzGfZqb4by7UlZL4fnWFuAJB40NgvVyNg21Cd672wIXa9WTIk18nP8q2Vr/zA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740599769; c=relaxed/simple; bh=ppCWJ8h33hUOdaVp+aso+1+CZYfvrzaXU/GTznrYEoY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=c7uHgPY8FufAxSkouEfS1Sn9LckTjrw93sPSx5FbA5q77pQ0O0UBmKSEz2Vw/GBorUjoSoFsVQGf83xKer3j0Ub0DR50i9I2ooBKKSNLGy/b9wF/fG0eEw1QYE6feKU3dRb6LzNGDNEXk/5Qh7IYUy16hjysdODtPhdRKLGdsJk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=ApKpKka5; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="ApKpKka5" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1740599766; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=FHJ6Cfru3RiTIXYAD00ZfSJbyQ1Wf9Qe7eITQyPHXlQ=; b=ApKpKka5+UMwGZJ55kw3IPhQ7QIGxhO9XUYd/YHKemBH/AZWF+setK9jrlFa6NSndNlw8h Vws5+EhtvZKEoFx4NsxKfRspt2UI1bfLxIDUZ0BVwrieDu+0eFgb2INs76+0RWTqC/UPw9 EOQg2m6Qa3rCtd4M/If03oaK+Uy4yto= Received: from mx-prod-mc-02.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-460--CODsmDjOJGc0YM_a2s05g-1; Wed, 26 Feb 2025 14:56:04 -0500 X-MC-Unique: -CODsmDjOJGc0YM_a2s05g-1 X-Mimecast-MFC-AGG-ID: -CODsmDjOJGc0YM_a2s05g_1740599763 Received: from mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.15]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-02.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 6B6F818EB2CB; Wed, 26 Feb 2025 19:56:03 +0000 (UTC) Received: from virtlab1023.lab.eng.rdu2.redhat.com (virtlab1023.lab.eng.rdu2.redhat.com [10.8.1.187]) by mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 72DB51955F0F; Wed, 26 Feb 2025 19:56:02 +0000 (UTC) From: Paolo Bonzini To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: seanjc@google.com, Yan Zhao , Rick Edgecombe , Isaku Yamahata Subject: [PATCH 24/29] KVM: TDX: Handle vCPU dissociation Date: Wed, 26 Feb 2025 14:55:24 -0500 Message-ID: <20250226195529.2314580-25-pbonzini@redhat.com> In-Reply-To: <20250226195529.2314580-1-pbonzini@redhat.com> References: <20250226195529.2314580-1-pbonzini@redhat.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.0 on 10.30.177.15 From: Isaku Yamahata Handle vCPUs dissociations by invoking SEAMCALL TDH.VP.FLUSH which flushes the address translation caches and cached TD VMCS of a TD vCPU in its associated pCPU. In TDX, a vCPUs can only be associated with one pCPU at a time, which is done by invoking SEAMCALL TDH.VP.ENTER. For a successful association, the vCPU must be dissociated from its previous associated pCPU. To facilitate vCPU dissociation, introduce a per-pCPU list associated_tdvcpus. Add a vCPU into this list when it's loaded into a new pCPU (i.e. when a vCPU is loaded for the first time or migrated to a new pCPU). vCPU dissociations can happen under below conditions: - On the op hardware_disable is called. This op is called when virtualization is disabled on a given pCPU, e.g. when hot-unplug a pCPU or machine shutdown/suspend. In this case, dissociate all vCPUs from the pCPU by iterating its per-pCPU list associated_tdvcpus. - On vCPU migration to a new pCPU. Before adding a vCPU into associated_tdvcpus list of the new pCPU, dissociation from its old pCPU is required, which is performed by issuing an IPI and executing SEAMCALL TDH.VP.FLUSH on the old pCPU. On a successful dissociation, the vCPU will be removed from the associated_tdvcpus list of its previously associated pCPU. - On tdx_mmu_release_hkid() is called. TDX mandates that all vCPUs must be disassociated prior to the release of an hkid. Therefore, dissociation of all vCPUs is a must before executing the SEAMCALL TDH.MNG.VPFLUSHDONE and subsequently freeing the hkid. Signed-off-by: Isaku Yamahata Co-developed-by: Rick Edgecombe Signed-off-by: Rick Edgecombe Co-developed-by: Yan Zhao Signed-off-by: Yan Zhao Message-ID: <20241112073858.22312-1-yan.y.zhao@intel.com> Signed-off-by: Paolo Bonzini --- arch/x86/kvm/vmx/main.c | 22 ++++- arch/x86/kvm/vmx/tdx.c | 159 +++++++++++++++++++++++++++++++++++-- arch/x86/kvm/vmx/tdx.h | 2 + arch/x86/kvm/vmx/x86_ops.h | 4 + 4 files changed, 177 insertions(+), 10 deletions(-) diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c index 828168e67d4e..abb0fc723a0b 100644 --- a/arch/x86/kvm/vmx/main.c +++ b/arch/x86/kvm/vmx/main.c @@ -10,6 +10,14 @@ #include "tdx.h" #include "tdx_arch.h" +static void vt_disable_virtualization_cpu(void) +{ + /* Note, TDX *and* VMX need to be disabled if TDX is enabled. */ + if (enable_tdx) + tdx_disable_virtualization_cpu(); + vmx_disable_virtualization_cpu(); +} + static __init int vt_hardware_setup(void) { int ret; @@ -111,6 +119,16 @@ static void vt_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event) vmx_vcpu_reset(vcpu, init_event); } +static void vt_vcpu_load(struct kvm_vcpu *vcpu, int cpu) +{ + if (is_td_vcpu(vcpu)) { + tdx_vcpu_load(vcpu, cpu); + return; + } + + vmx_vcpu_load(vcpu, cpu); +} + static void vt_flush_tlb_all(struct kvm_vcpu *vcpu) { if (is_td_vcpu(vcpu)) { @@ -199,7 +217,7 @@ struct kvm_x86_ops vt_x86_ops __initdata = { .hardware_unsetup = vmx_hardware_unsetup, .enable_virtualization_cpu = vmx_enable_virtualization_cpu, - .disable_virtualization_cpu = vmx_disable_virtualization_cpu, + .disable_virtualization_cpu = vt_disable_virtualization_cpu, .emergency_disable_virtualization_cpu = vmx_emergency_disable_virtualization_cpu, .has_emulated_msr = vmx_has_emulated_msr, @@ -216,7 +234,7 @@ struct kvm_x86_ops vt_x86_ops __initdata = { .vcpu_reset = vt_vcpu_reset, .prepare_switch_to_guest = vmx_prepare_switch_to_guest, - .vcpu_load = vmx_vcpu_load, + .vcpu_load = vt_vcpu_load, .vcpu_put = vmx_vcpu_put, .update_exception_bitmap = vmx_update_exception_bitmap, diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index 246f924ae6f6..a6696448cb53 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -162,6 +162,21 @@ static bool tdx_operand_busy(u64 err) } +/* + * A per-CPU list of TD vCPUs associated with a given CPU. + * Protected by interrupt mask. Only manipulated by the CPU owning this per-CPU + * list. + * - When a vCPU is loaded onto a CPU, it is removed from the per-CPU list of + * the old CPU during the IPI callback running on the old CPU, and then added + * to the per-CPU list of the new CPU. + * - When a TD is tearing down, all vCPUs are disassociated from their current + * running CPUs and removed from the per-CPU list during the IPI callback + * running on those CPUs. + * - When a CPU is brought down, traverse the per-CPU list to disassociate all + * associated TD vCPUs and remove them from the per-CPU list. + */ +static DEFINE_PER_CPU(struct list_head, associated_tdvcpus); + static inline void tdx_hkid_free(struct kvm_tdx *kvm_tdx) { tdx_guest_keyid_free(kvm_tdx->hkid); @@ -177,6 +192,22 @@ static inline bool is_hkid_assigned(struct kvm_tdx *kvm_tdx) return kvm_tdx->hkid > 0; } +static inline void tdx_disassociate_vp(struct kvm_vcpu *vcpu) +{ + lockdep_assert_irqs_disabled(); + + list_del(&to_tdx(vcpu)->cpu_list); + + /* + * Ensure tdx->cpu_list is updated before setting vcpu->cpu to -1, + * otherwise, a different CPU can see vcpu->cpu = -1 and add the vCPU + * to its list before it's deleted from this CPU's list. + */ + smp_wmb(); + + vcpu->cpu = -1; +} + static void tdx_clear_page(struct page *page) { const void *zero_page = (const void *) page_to_virt(ZERO_PAGE(0)); @@ -243,6 +274,83 @@ static void tdx_reclaim_control_page(struct page *ctrl_page) __free_page(ctrl_page); } +struct tdx_flush_vp_arg { + struct kvm_vcpu *vcpu; + u64 err; +}; + +static void tdx_flush_vp(void *_arg) +{ + struct tdx_flush_vp_arg *arg = _arg; + struct kvm_vcpu *vcpu = arg->vcpu; + u64 err; + + arg->err = 0; + lockdep_assert_irqs_disabled(); + + /* Task migration can race with CPU offlining. */ + if (unlikely(vcpu->cpu != raw_smp_processor_id())) + return; + + /* + * No need to do TDH_VP_FLUSH if the vCPU hasn't been initialized. The + * list tracking still needs to be updated so that it's correct if/when + * the vCPU does get initialized. + */ + if (to_tdx(vcpu)->state != VCPU_TD_STATE_UNINITIALIZED) { + /* + * No need to retry. TDX Resources needed for TDH.VP.FLUSH are: + * TDVPR as exclusive, TDR as shared, and TDCS as shared. This + * vp flush function is called when destructing vCPU/TD or vCPU + * migration. No other thread uses TDVPR in those cases. + */ + err = tdh_vp_flush(&to_tdx(vcpu)->vp); + if (unlikely(err && err != TDX_VCPU_NOT_ASSOCIATED)) { + /* + * This function is called in IPI context. Do not use + * printk to avoid console semaphore. + * The caller prints out the error message, instead. + */ + if (err) + arg->err = err; + } + } + + tdx_disassociate_vp(vcpu); +} + +static void tdx_flush_vp_on_cpu(struct kvm_vcpu *vcpu) +{ + struct tdx_flush_vp_arg arg = { + .vcpu = vcpu, + }; + int cpu = vcpu->cpu; + + if (unlikely(cpu == -1)) + return; + + smp_call_function_single(cpu, tdx_flush_vp, &arg, 1); + if (KVM_BUG_ON(arg.err, vcpu->kvm)) + pr_tdx_error(TDH_VP_FLUSH, arg.err); +} + +void tdx_disable_virtualization_cpu(void) +{ + int cpu = raw_smp_processor_id(); + struct list_head *tdvcpus = &per_cpu(associated_tdvcpus, cpu); + struct tdx_flush_vp_arg arg; + struct vcpu_tdx *tdx, *tmp; + unsigned long flags; + + local_irq_save(flags); + /* Safe variant needed as tdx_disassociate_vp() deletes the entry. */ + list_for_each_entry_safe(tdx, tmp, tdvcpus, cpu_list) { + arg.vcpu = &tdx->vcpu; + tdx_flush_vp(&arg); + } + local_irq_restore(flags); +} + #define TDX_SEAMCALL_RETRIES 10000 static void smp_func_do_phymem_cache_wb(void *unused) @@ -281,22 +389,21 @@ void tdx_mmu_release_hkid(struct kvm *kvm) bool packages_allocated, targets_allocated; struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm); cpumask_var_t packages, targets; - u64 err; + struct kvm_vcpu *vcpu; + unsigned long j; int i; + u64 err; if (!is_hkid_assigned(kvm_tdx)) return; - /* KeyID has been allocated but guest is not yet configured */ - if (!kvm_tdx->td.tdr_page) { - tdx_hkid_free(kvm_tdx); - return; - } - packages_allocated = zalloc_cpumask_var(&packages, GFP_KERNEL); targets_allocated = zalloc_cpumask_var(&targets, GFP_KERNEL); cpus_read_lock(); + kvm_for_each_vcpu(j, vcpu, kvm) + tdx_flush_vp_on_cpu(vcpu); + /* * TDH.PHYMEM.CACHE.WB tries to acquire the TDX module global lock * and can fail with TDX_OPERAND_BUSY when it fails to get the lock. @@ -310,6 +417,16 @@ void tdx_mmu_release_hkid(struct kvm *kvm) * After the above flushing vps, there should be no more vCPU * associations, as all vCPU fds have been released at this stage. */ + err = tdh_mng_vpflushdone(&kvm_tdx->td); + if (err == TDX_FLUSHVP_NOT_DONE) + goto out; + if (KVM_BUG_ON(err, kvm)) { + pr_tdx_error(TDH_MNG_VPFLUSHDONE, err); + pr_err("tdh_mng_vpflushdone() failed. HKID %d is leaked.\n", + kvm_tdx->hkid); + goto out; + } + for_each_online_cpu(i) { if (packages_allocated && cpumask_test_and_set_cpu(topology_physical_package_id(i), @@ -335,6 +452,7 @@ void tdx_mmu_release_hkid(struct kvm *kvm) tdx_hkid_free(kvm_tdx); } +out: mutex_unlock(&tdx_lock); cpus_read_unlock(); free_cpumask_var(targets); @@ -483,6 +601,27 @@ int tdx_vcpu_create(struct kvm_vcpu *vcpu) return 0; } +void tdx_vcpu_load(struct kvm_vcpu *vcpu, int cpu) +{ + struct vcpu_tdx *tdx = to_tdx(vcpu); + + if (vcpu->cpu == cpu || !is_hkid_assigned(to_kvm_tdx(vcpu->kvm))) + return; + + tdx_flush_vp_on_cpu(vcpu); + + KVM_BUG_ON(cpu != raw_smp_processor_id(), vcpu->kvm); + local_irq_disable(); + /* + * Pairs with the smp_wmb() in tdx_disassociate_vp() to ensure + * vcpu->cpu is read before tdx->cpu_list. + */ + smp_rmb(); + + list_add(&tdx->cpu_list, &per_cpu(associated_tdvcpus, cpu)); + local_irq_enable(); +} + void tdx_vcpu_free(struct kvm_vcpu *vcpu) { struct kvm_tdx *kvm_tdx = to_kvm_tdx(vcpu->kvm); @@ -2038,7 +2177,11 @@ void tdx_cleanup(void) int __init tdx_bringup(void) { - int r; + int r, i; + + /* tdx_disable_virtualization_cpu() uses associated_tdvcpus. */ + for_each_possible_cpu(i) + INIT_LIST_HEAD(&per_cpu(associated_tdvcpus, i)); if (!enable_tdx) return 0; diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h index b71f8e9643e4..8ed96f980d9a 100644 --- a/arch/x86/kvm/vmx/tdx.h +++ b/arch/x86/kvm/vmx/tdx.h @@ -47,6 +47,8 @@ struct vcpu_tdx { struct tdx_vp vp; + struct list_head cpu_list; + enum vcpu_tdx_state state; }; diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h index bc3cf1c1da37..6ab10d3cdb6f 100644 --- a/arch/x86/kvm/vmx/x86_ops.h +++ b/arch/x86/kvm/vmx/x86_ops.h @@ -122,6 +122,7 @@ void vmx_cancel_hv_timer(struct kvm_vcpu *vcpu); void vmx_setup_mce(struct kvm_vcpu *vcpu); #ifdef CONFIG_INTEL_TDX_HOST +void tdx_disable_virtualization_cpu(void); int tdx_vm_init(struct kvm *kvm); void tdx_mmu_release_hkid(struct kvm *kvm); void tdx_vm_destroy(struct kvm *kvm); @@ -129,6 +130,7 @@ int tdx_vm_ioctl(struct kvm *kvm, void __user *argp); int tdx_vcpu_create(struct kvm_vcpu *vcpu); void tdx_vcpu_free(struct kvm_vcpu *vcpu); +void tdx_vcpu_load(struct kvm_vcpu *vcpu, int cpu); int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp); @@ -146,6 +148,7 @@ void tdx_flush_tlb_all(struct kvm_vcpu *vcpu); void tdx_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa, int root_level); int tdx_gmem_private_max_mapping_level(struct kvm *kvm, kvm_pfn_t pfn); #else +static inline void tdx_disable_virtualization_cpu(void) {} static inline int tdx_vm_init(struct kvm *kvm) { return -EOPNOTSUPP; } static inline void tdx_mmu_release_hkid(struct kvm *kvm) {} static inline void tdx_vm_destroy(struct kvm *kvm) {} @@ -153,6 +156,7 @@ static inline int tdx_vm_ioctl(struct kvm *kvm, void __user *argp) { return -EOP static inline int tdx_vcpu_create(struct kvm_vcpu *vcpu) { return -EOPNOTSUPP; } static inline void tdx_vcpu_free(struct kvm_vcpu *vcpu) {} +static inline void tdx_vcpu_load(struct kvm_vcpu *vcpu, int cpu) {} static inline int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp) { return -EOPNOTSUPP; } From patchwork Wed Feb 26 19:55:25 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Bonzini X-Patchwork-Id: 13993121 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4EC53280A41 for ; Wed, 26 Feb 2025 19:56:12 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740599774; cv=none; b=Q4MOhp+UN7+zh7oCubiyCf9iBxeUrUNnLpL9YilsJBVfv+jLRerbLSOSHZBAYXEwCeZjZrjmt3tvwmVx3cxv7aGz/dQLDoCMRhWyqR8OA760NWP0vUQgA2VUWaDC9pVbTnFpWk7cEp6C9VPC6HC5i+hl29gCfAQg4/45oy/jI6E= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740599774; c=relaxed/simple; bh=M6biQnI0vhTOaFRQ2KtvhqOWbF0op87BMwR5488rgxE=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=RE446mPEF1F3wWBTqmLKVdAbeT+2fguP/2Yie2+2pe2sal9GSjhAUgKeEslL2NlfqTLRrXIjrq+6AG/2rZVYcKZEZPEQYExnhq1i4Cq6oqcfBiV7BtyZB4FnLXj/OOaxUDyvxIGWkIb8NdCnx2XelLq7vT8TklT2lRuCqzln4JI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=Q7lgui0k; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="Q7lgui0k" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1740599771; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=jV0TtFGYV1xlyHHgap+hex5QF40o3OgRCNCmWF3Bi/U=; b=Q7lgui0krNaq0XaerdLaoALX5Ocnea0f2Alg7XaClJ0MKJ6H7ydbWQ/skiJgQf3AMmOBd/ EN3Tr5WLE7YR2YFGOosdE8LO9PCNLnr5mUwSUvaPUdyGQa4tArTKx8AGTXTAZcX8SuKzn1 FTLApvjAKj9/7UcLRR/FUbCd9YjeAqY= Received: from mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-50-BtxgYE4cM-K1SkYwYdYBHQ-1; Wed, 26 Feb 2025 14:56:06 -0500 X-MC-Unique: BtxgYE4cM-K1SkYwYdYBHQ-1 X-Mimecast-MFC-AGG-ID: BtxgYE4cM-K1SkYwYdYBHQ_1740599764 Received: from mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.15]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 7FA5418D9516; Wed, 26 Feb 2025 19:56:04 +0000 (UTC) Received: from virtlab1023.lab.eng.rdu2.redhat.com (virtlab1023.lab.eng.rdu2.redhat.com [10.8.1.187]) by mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id A0AA519560AE; Wed, 26 Feb 2025 19:56:03 +0000 (UTC) From: Paolo Bonzini To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: seanjc@google.com, Yan Zhao , Rick Edgecombe Subject: [PATCH 25/29] KVM: Add parameter "kvm" to kvm_cpu_dirty_log_size() and its callers Date: Wed, 26 Feb 2025 14:55:25 -0500 Message-ID: <20250226195529.2314580-26-pbonzini@redhat.com> In-Reply-To: <20250226195529.2314580-1-pbonzini@redhat.com> References: <20250226195529.2314580-1-pbonzini@redhat.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.0 on 10.30.177.15 From: Yan Zhao Add a parameter "kvm" to kvm_cpu_dirty_log_size() and down to its callers: kvm_dirty_ring_get_rsvd_entries(), kvm_dirty_ring_alloc(). This is a preparation to make cpu_dirty_log_size a per-VM value rather than a system-wide value. No function changes expected. Signed-off-by: Yan Zhao Signed-off-by: Paolo Bonzini --- arch/x86/kvm/mmu/mmu.c | 2 +- include/linux/kvm_dirty_ring.h | 11 ++++++----- virt/kvm/dirty_ring.c | 11 ++++++----- virt/kvm/kvm_main.c | 4 ++-- 4 files changed, 15 insertions(+), 13 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 443c4836bc62..3caeaebda637 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -1311,7 +1311,7 @@ void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm, kvm_mmu_write_protect_pt_masked(kvm, slot, gfn_offset, mask); } -int kvm_cpu_dirty_log_size(void) +int kvm_cpu_dirty_log_size(struct kvm *kvm) { return kvm_x86_ops.cpu_dirty_log_size; } diff --git a/include/linux/kvm_dirty_ring.h b/include/linux/kvm_dirty_ring.h index 4862c98d80d3..da4d9b5f58f1 100644 --- a/include/linux/kvm_dirty_ring.h +++ b/include/linux/kvm_dirty_ring.h @@ -32,7 +32,7 @@ struct kvm_dirty_ring { * If CONFIG_HAVE_HVM_DIRTY_RING not defined, kvm_dirty_ring.o should * not be included as well, so define these nop functions for the arch. */ -static inline u32 kvm_dirty_ring_get_rsvd_entries(void) +static inline u32 kvm_dirty_ring_get_rsvd_entries(struct kvm *kvm) { return 0; } @@ -42,7 +42,7 @@ static inline bool kvm_use_dirty_bitmap(struct kvm *kvm) return true; } -static inline int kvm_dirty_ring_alloc(struct kvm_dirty_ring *ring, +static inline int kvm_dirty_ring_alloc(struct kvm *kvm, struct kvm_dirty_ring *ring, int index, u32 size) { return 0; @@ -71,11 +71,12 @@ static inline void kvm_dirty_ring_free(struct kvm_dirty_ring *ring) #else /* CONFIG_HAVE_KVM_DIRTY_RING */ -int kvm_cpu_dirty_log_size(void); +int kvm_cpu_dirty_log_size(struct kvm *kvm); bool kvm_use_dirty_bitmap(struct kvm *kvm); bool kvm_arch_allow_write_without_running_vcpu(struct kvm *kvm); -u32 kvm_dirty_ring_get_rsvd_entries(void); -int kvm_dirty_ring_alloc(struct kvm_dirty_ring *ring, int index, u32 size); +u32 kvm_dirty_ring_get_rsvd_entries(struct kvm *kvm); +int kvm_dirty_ring_alloc(struct kvm *kvm, struct kvm_dirty_ring *ring, + int index, u32 size); /* * called with kvm->slots_lock held, returns the number of diff --git a/virt/kvm/dirty_ring.c b/virt/kvm/dirty_ring.c index 7bc74969a819..d14ffc7513ee 100644 --- a/virt/kvm/dirty_ring.c +++ b/virt/kvm/dirty_ring.c @@ -11,14 +11,14 @@ #include #include "kvm_mm.h" -int __weak kvm_cpu_dirty_log_size(void) +int __weak kvm_cpu_dirty_log_size(struct kvm *kvm) { return 0; } -u32 kvm_dirty_ring_get_rsvd_entries(void) +u32 kvm_dirty_ring_get_rsvd_entries(struct kvm *kvm) { - return KVM_DIRTY_RING_RSVD_ENTRIES + kvm_cpu_dirty_log_size(); + return KVM_DIRTY_RING_RSVD_ENTRIES + kvm_cpu_dirty_log_size(kvm); } bool kvm_use_dirty_bitmap(struct kvm *kvm) @@ -74,14 +74,15 @@ static void kvm_reset_dirty_gfn(struct kvm *kvm, u32 slot, u64 offset, u64 mask) KVM_MMU_UNLOCK(kvm); } -int kvm_dirty_ring_alloc(struct kvm_dirty_ring *ring, int index, u32 size) +int kvm_dirty_ring_alloc(struct kvm *kvm, struct kvm_dirty_ring *ring, + int index, u32 size) { ring->dirty_gfns = vzalloc(size); if (!ring->dirty_gfns) return -ENOMEM; ring->size = size / sizeof(struct kvm_dirty_gfn); - ring->soft_limit = ring->size - kvm_dirty_ring_get_rsvd_entries(); + ring->soft_limit = ring->size - kvm_dirty_ring_get_rsvd_entries(kvm); ring->dirty_index = 0; ring->reset_index = 0; ring->index = index; diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index c88acfa154e6..3187b2ae0e5b 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -4109,7 +4109,7 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, unsigned long id) goto vcpu_free_run_page; if (kvm->dirty_ring_size) { - r = kvm_dirty_ring_alloc(&vcpu->dirty_ring, + r = kvm_dirty_ring_alloc(kvm, &vcpu->dirty_ring, id, kvm->dirty_ring_size); if (r) goto arch_vcpu_destroy; @@ -4848,7 +4848,7 @@ static int kvm_vm_ioctl_enable_dirty_log_ring(struct kvm *kvm, u32 size) return -EINVAL; /* Should be bigger to keep the reserved entries, or a page */ - if (size < kvm_dirty_ring_get_rsvd_entries() * + if (size < kvm_dirty_ring_get_rsvd_entries(kvm) * sizeof(struct kvm_dirty_gfn) || size < PAGE_SIZE) return -EINVAL; From patchwork Wed Feb 26 19:55:26 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Bonzini X-Patchwork-Id: 13993122 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 60947280A3A for ; Wed, 26 Feb 2025 19:56:13 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740599775; cv=none; b=Mcp61wVUlmxKLmP3dQWIPmf+leTQejsIjjaTkmb5wHjQsEYzdZD2Apyb450m0coZ3icYXUp9WmJqwJVFaH1AqMPZTE5J9960sq+7p/pjB3pklV4jFfSYTF5MgOczVTcCK5C7RW+BEyNJX3Jy6NAMDWrKMC1vWxxTD3M/1PGPjI8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740599775; c=relaxed/simple; bh=KEhVu0MXnvjzc1qVWl89tw6jMbGzoJMjFYHerlDTWZI=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=U/wjXrLBnAbhxuvGHQLgfyMa9VPqlwmlzJjH/xQmbw2JArhy1//tCuEJ8GveSqaDQmn7kazhKSGp+tIftV7vHQnqwQLGhUUzG42xPyNmByqie+qkuPu3uHCigMLZrH5/k/A4oqgd17pbNMhMe5XzpswzBHbTBBlDM2wqw6ERjZc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=Q1q9fp6O; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="Q1q9fp6O" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1740599772; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=wBV10B36RcIKttTB1JsYke+OmG+SSB9zJ7GgxF04TbQ=; b=Q1q9fp6OcjfhfJd6lEahJZdfHheLZBscUkYLQozR4KG/NJcwXx2gbQdF0GAOgPstsfZ3WW zDTWMg80J+S6eYL/GDp3y3EAOnqXcIbS7IwERv15i6BtDwcnpUJNPJsTx6uUxod8ZTSvfJ pkcyRDIUEM0r2Yc30NC8xOKUb+1P6E4= Received: from mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-658-L7Yg1B5oMSOe1v2t0B1SBg-1; Wed, 26 Feb 2025 14:56:10 -0500 X-MC-Unique: L7Yg1B5oMSOe1v2t0B1SBg-1 X-Mimecast-MFC-AGG-ID: L7Yg1B5oMSOe1v2t0B1SBg_1740599768 Received: from mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.15]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 8D1A11801A23; Wed, 26 Feb 2025 19:56:05 +0000 (UTC) Received: from virtlab1023.lab.eng.rdu2.redhat.com (virtlab1023.lab.eng.rdu2.redhat.com [10.8.1.187]) by mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id B0E0E19560AE; Wed, 26 Feb 2025 19:56:04 +0000 (UTC) From: Paolo Bonzini To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: seanjc@google.com, Yan Zhao , Rick Edgecombe Subject: [PATCH 26/29] KVM: x86/mmu: Add parameter "kvm" to kvm_mmu_page_ad_need_write_protect() Date: Wed, 26 Feb 2025 14:55:26 -0500 Message-ID: <20250226195529.2314580-27-pbonzini@redhat.com> In-Reply-To: <20250226195529.2314580-1-pbonzini@redhat.com> References: <20250226195529.2314580-1-pbonzini@redhat.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.0 on 10.30.177.15 From: Yan Zhao Add a parameter "kvm" to kvm_mmu_page_ad_need_write_protect() and its caller tdp_mmu_need_write_protect(). This is a preparation to make cpu_dirty_log_size a per-VM value rather than a system-wide value. No function changes expected. Signed-off-by: Yan Zhao Signed-off-by: Paolo Bonzini --- arch/x86/kvm/mmu/mmu_internal.h | 3 ++- arch/x86/kvm/mmu/spte.c | 2 +- arch/x86/kvm/mmu/tdp_mmu.c | 12 ++++++------ 3 files changed, 9 insertions(+), 8 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h index 75f00598289d..86d6d4f82cf4 100644 --- a/arch/x86/kvm/mmu/mmu_internal.h +++ b/arch/x86/kvm/mmu/mmu_internal.h @@ -187,7 +187,8 @@ static inline gfn_t kvm_gfn_root_bits(const struct kvm *kvm, const struct kvm_mm return kvm_gfn_direct_bits(kvm); } -static inline bool kvm_mmu_page_ad_need_write_protect(struct kvm_mmu_page *sp) +static inline bool kvm_mmu_page_ad_need_write_protect(struct kvm *kvm, + struct kvm_mmu_page *sp) { /* * When using the EPT page-modification log, the GPAs in the CPU dirty diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c index e819d16655b6..a609d5b58b69 100644 --- a/arch/x86/kvm/mmu/spte.c +++ b/arch/x86/kvm/mmu/spte.c @@ -168,7 +168,7 @@ bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp, if (sp->role.ad_disabled) spte |= SPTE_TDP_AD_DISABLED; - else if (kvm_mmu_page_ad_need_write_protect(sp)) + else if (kvm_mmu_page_ad_need_write_protect(vcpu->kvm, sp)) spte |= SPTE_TDP_AD_WRPROT_ONLY; spte |= shadow_present_mask; diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index 22675a5746d0..fd0a7792386b 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -1613,21 +1613,21 @@ void kvm_tdp_mmu_try_split_huge_pages(struct kvm *kvm, } } -static bool tdp_mmu_need_write_protect(struct kvm_mmu_page *sp) +static bool tdp_mmu_need_write_protect(struct kvm *kvm, struct kvm_mmu_page *sp) { /* * All TDP MMU shadow pages share the same role as their root, aside * from level, so it is valid to key off any shadow page to determine if * write protection is needed for an entire tree. */ - return kvm_mmu_page_ad_need_write_protect(sp) || !kvm_ad_enabled; + return kvm_mmu_page_ad_need_write_protect(kvm, sp) || !kvm_ad_enabled; } static void clear_dirty_gfn_range(struct kvm *kvm, struct kvm_mmu_page *root, gfn_t start, gfn_t end) { - const u64 dbit = tdp_mmu_need_write_protect(root) ? PT_WRITABLE_MASK : - shadow_dirty_mask; + const u64 dbit = tdp_mmu_need_write_protect(kvm, root) ? + PT_WRITABLE_MASK : shadow_dirty_mask; struct tdp_iter iter; rcu_read_lock(); @@ -1672,8 +1672,8 @@ void kvm_tdp_mmu_clear_dirty_slot(struct kvm *kvm, static void clear_dirty_pt_masked(struct kvm *kvm, struct kvm_mmu_page *root, gfn_t gfn, unsigned long mask, bool wrprot) { - const u64 dbit = (wrprot || tdp_mmu_need_write_protect(root)) ? PT_WRITABLE_MASK : - shadow_dirty_mask; + const u64 dbit = (wrprot || tdp_mmu_need_write_protect(kvm, root)) ? + PT_WRITABLE_MASK : shadow_dirty_mask; struct tdp_iter iter; lockdep_assert_held_write(&kvm->mmu_lock); From patchwork Wed Feb 26 19:55:27 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Bonzini X-Patchwork-Id: 13993124 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 383BE25BCEE for ; Wed, 26 Feb 2025 19:56:16 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740599778; cv=none; b=rLEh2WBZlg20Plm+4nxyohOBbkUw+wgEl5CfApznZn+8IHkfWTiqRAY1sODgUWOu+0fQd3d4S0Se5U02jWijxk96bU0fVEi9MbLQIBVWNVOtyiYpmwLCRtssdV1m93C4yzneCYJF95v22Absq9FAFof8lamzhk8m/bknXqGFIDs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740599778; c=relaxed/simple; bh=16Zehdb4AgOjq9bNHlth8krJArW6jp6YJepC5NpyXfg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=IIaEKcIv4LTjT2cz/cRMXcAMm45IkD6eKk1zso56PMjQdRFcxoy3kAVaITDYOfy1sJaYydRahXfJ/6XfTpJ/rjHJwti4lmOduxXtHpO4fysHWa6ckeImgf8haKbbjWPQ0K/kwir+q4hzWnILNp9Lm0TvvjJJVIItTx0FPeFAwNE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=S6WW3pl6; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="S6WW3pl6" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1740599775; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=SOMXc0XGb9i9NVESNvirv6McCEKwkARDJwtJyCjzRPc=; b=S6WW3pl6OJxSAvLQLD5HFGokbo7YoKhHAGyqt4Rew4eAeoXbW9hcX2uEixzAy2s/HDGJRR FjExmPiphhZaRxqDrs0gpmyBGZZfbef4YjSRmHJsTj2WHXg6OQRXwIQVo5pFjdCf/OTFPj iOUoqo98jK1OZFxgjLkKYDU73D7Ii2A= Received: from mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-151-FsHr-p4EMC-5-cbQlFrsMw-1; Wed, 26 Feb 2025 14:56:11 -0500 X-MC-Unique: FsHr-p4EMC-5-cbQlFrsMw-1 X-Mimecast-MFC-AGG-ID: FsHr-p4EMC-5-cbQlFrsMw_1740599770 Received: from mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.15]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 3D80E18009AC; Wed, 26 Feb 2025 19:56:07 +0000 (UTC) Received: from virtlab1023.lab.eng.rdu2.redhat.com (virtlab1023.lab.eng.rdu2.redhat.com [10.8.1.187]) by mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id C15171955F0F; Wed, 26 Feb 2025 19:56:05 +0000 (UTC) From: Paolo Bonzini To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: seanjc@google.com, Yan Zhao , Rick Edgecombe , ANAND NARSHINHA PATIL , Pedro Principeza , Farrah Chen , Kai Huang Subject: [PATCH 27/29] KVM: x86: Make cpu_dirty_log_size a per-VM value Date: Wed, 26 Feb 2025 14:55:27 -0500 Message-ID: <20250226195529.2314580-28-pbonzini@redhat.com> In-Reply-To: <20250226195529.2314580-1-pbonzini@redhat.com> References: <20250226195529.2314580-1-pbonzini@redhat.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.0 on 10.30.177.15 From: Yan Zhao Make cpu_dirty_log_size (CPU's dirty log buffer size) a per-VM value and set the per-VM cpu_dirty_log_size only for normal VMs when PML is enabled. Do not set it for TDs. Until now, cpu_dirty_log_size was a system-wide value that is used for all VMs and is set to the PML buffer size when PML was enabled in VMX. However, PML is not currently supported for TDs, though PML remains available for normal VMs as long as the feature is supported by hardware and enabled in VMX. Making cpu_dirty_log_size a per-VM value allows it to be ther PML buffer size for normal VMs and 0 for TDs. This allows functions like kvm_arch_sync_dirty_log() and kvm_mmu_update_cpu_dirty_logging() to determine if PML is supported, in order to kick off vCPUs or request them to update CPU dirty logging status (turn on/off PML in VMCS). This fixes an issue first reported in [1], where QEMU attaches an emulated VGA device to a TD; note that KVM_MEM_LOG_DIRTY_PAGES still works if the corresponding has no flag KVM_MEM_GUEST_MEMFD. KVM then invokes kvm_mmu_update_cpu_dirty_logging() and from there vmx_update_cpu_dirty_logging(), which incorrectly accesses a kvm_vmx struct for a TDX VM. Reported-by: ANAND NARSHINHA PATIL Reported-by: Pedro Principeza Reported-by: Farrah Chen Closes: https://github.com/canonical/tdx/issues/202 Link: https://github.com/canonical/tdx/issues/202 [1] Suggested-by: Kai Huang Signed-off-by: Yan Zhao Signed-off-by: Paolo Bonzini --- arch/x86/include/asm/kvm_host.h | 12 +++++++----- arch/x86/kvm/mmu/mmu.c | 4 ++-- arch/x86/kvm/mmu/mmu_internal.h | 2 +- arch/x86/kvm/vmx/main.c | 1 - arch/x86/kvm/vmx/vmx.c | 6 +++--- arch/x86/kvm/x86.c | 6 +++--- 6 files changed, 16 insertions(+), 15 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 04f6cc48167a..c89130fda012 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1560,6 +1560,13 @@ struct kvm_arch { struct kvm_mmu_memory_cache split_desc_cache; gfn_t gfn_direct_bits; + + /* + * Size of the CPU's dirty log buffer, i.e. VMX's PML buffer. A Zero + * value indicates CPU dirty logging is unsupported or disabled in + * current VM. + */ + int cpu_dirty_log_size; }; struct kvm_vm_stat { @@ -1813,11 +1820,6 @@ struct kvm_x86_ops { struct x86_exception *exception); void (*handle_exit_irqoff)(struct kvm_vcpu *vcpu); - /* - * Size of the CPU's dirty log buffer, i.e. VMX's PML buffer. A zero - * value indicates CPU dirty logging is unsupported or disabled. - */ - int cpu_dirty_log_size; void (*update_cpu_dirty_logging)(struct kvm_vcpu *vcpu); const struct kvm_x86_nested_ops *nested_ops; diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 3caeaebda637..e6eb3a262f8d 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -1305,7 +1305,7 @@ void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm, * enabled but it chooses between clearing the Dirty bit and Writeable * bit based on the context. */ - if (kvm_x86_ops.cpu_dirty_log_size) + if (kvm->arch.cpu_dirty_log_size) kvm_mmu_clear_dirty_pt_masked(kvm, slot, gfn_offset, mask); else kvm_mmu_write_protect_pt_masked(kvm, slot, gfn_offset, mask); @@ -1313,7 +1313,7 @@ void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm, int kvm_cpu_dirty_log_size(struct kvm *kvm) { - return kvm_x86_ops.cpu_dirty_log_size; + return kvm->arch.cpu_dirty_log_size; } bool kvm_mmu_slot_gfn_write_protect(struct kvm *kvm, diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h index 86d6d4f82cf4..db8f33e4de62 100644 --- a/arch/x86/kvm/mmu/mmu_internal.h +++ b/arch/x86/kvm/mmu/mmu_internal.h @@ -198,7 +198,7 @@ static inline bool kvm_mmu_page_ad_need_write_protect(struct kvm *kvm, * being enabled is mandatory as the bits used to denote WP-only SPTEs * are reserved for PAE paging (32-bit KVM). */ - return kvm_x86_ops.cpu_dirty_log_size && sp->role.guest_mode; + return kvm->arch.cpu_dirty_log_size && sp->role.guest_mode; } static inline gfn_t gfn_round_for_level(gfn_t gfn, int level) diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c index abb0fc723a0b..ba3a23747bb1 100644 --- a/arch/x86/kvm/vmx/main.c +++ b/arch/x86/kvm/vmx/main.c @@ -322,7 +322,6 @@ struct kvm_x86_ops vt_x86_ops __initdata = { .check_intercept = vmx_check_intercept, .handle_exit_irqoff = vmx_handle_exit_irqoff, - .cpu_dirty_log_size = PML_LOG_NR_ENTRIES, .update_cpu_dirty_logging = vmx_update_cpu_dirty_logging, .nested_ops = &vmx_nested_ops, diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 18150a38fe09..b71125a1b5cf 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -7653,6 +7653,9 @@ int vmx_vm_init(struct kvm *kvm) break; } } + + if (enable_pml) + kvm->arch.cpu_dirty_log_size = PML_LOG_NR_ENTRIES; return 0; } @@ -8506,9 +8509,6 @@ __init int vmx_hardware_setup(void) if (!enable_ept || !enable_ept_ad_bits || !cpu_has_vmx_pml()) enable_pml = 0; - if (!enable_pml) - vt_x86_ops.cpu_dirty_log_size = 0; - if (!cpu_has_vmx_preemption_timer()) enable_preemption_timer = false; diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index e981e24820a0..edf47b8cb949 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -6468,7 +6468,7 @@ void kvm_arch_sync_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot) struct kvm_vcpu *vcpu; unsigned long i; - if (!kvm_x86_ops.cpu_dirty_log_size) + if (!kvm->arch.cpu_dirty_log_size) return; kvm_for_each_vcpu(i, vcpu, kvm) @@ -13055,7 +13055,7 @@ static void kvm_mmu_update_cpu_dirty_logging(struct kvm *kvm, bool enable) { int nr_slots; - if (!kvm_x86_ops.cpu_dirty_log_size) + if (!kvm->arch.cpu_dirty_log_size) return; nr_slots = atomic_read(&kvm->nr_memslots_dirty_logging); @@ -13127,7 +13127,7 @@ static void kvm_mmu_slot_apply_flags(struct kvm *kvm, if (READ_ONCE(eager_page_split)) kvm_mmu_slot_try_split_huge_pages(kvm, new, PG_LEVEL_4K); - if (kvm_x86_ops.cpu_dirty_log_size) { + if (kvm->arch.cpu_dirty_log_size) { kvm_mmu_slot_leaf_clear_dirty(kvm, new); kvm_mmu_slot_remove_write_access(kvm, new, PG_LEVEL_2M); } else { From patchwork Wed Feb 26 19:55:28 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Bonzini X-Patchwork-Id: 13993120 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5EDDA280A44 for ; Wed, 26 Feb 2025 19:56:12 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740599773; cv=none; b=rUf41lZEkd0so54qwY6RI965soHSF9yrxZi8LSlCq6MI9ILfMAI1yDrBV3IBdJvXWnf+x1GmCT7K3LRNhUYOz5rRPCHlmp5Giolt/RXCdnpCLE2B+bqedZvvPZwbjRhs8A9/+E2Nif84yuZr1DrU6sMBoFBH9hs56sYI496NgD4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740599773; c=relaxed/simple; bh=ewztqQ95IRPQy7XSpC72R651qzafJayEHez0KrwzBR0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=mr3X7DZnNJ+MWwNzN0W5FJ6MfmGF6nRnVcKejoHMYlr9PcWZWEws+eTi2JYmmFRzI6PabTufOm72T3KthRzTqRpwr6YVCCOT506zEjKIkhz2IA1eByYIybuNRuLAdrQOgKVJoNShVm/dtqk1/cOG/hp2ZUdUCByyJB7YFucsTTY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=VbENp89X; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="VbENp89X" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1740599771; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=KcPmGgG2rwYM0IbeRzRvsJ5ZHW+iyZcOLPwbtL9kX3o=; b=VbENp89Xj/YlCmf726oiwV5dbLZIJ8aDgHzXnCsxS8IXdkzrysYZT3kOf+dg8JMhvBWE3i fdtAc+6BbW2UOxGPBj7DTAPFR0/U8aOCs8khH5slTr7TbtYwVQ1/27WYmhwrrfwFaTwYv/ 79dbd6AuANhA0ag0EHaFhD/3IxPSaB8= Received: from mx-prod-mc-04.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-536-KZhp_1UOMyumCh7OOQ_CwQ-1; Wed, 26 Feb 2025 14:56:09 -0500 X-MC-Unique: KZhp_1UOMyumCh7OOQ_CwQ-1 X-Mimecast-MFC-AGG-ID: KZhp_1UOMyumCh7OOQ_CwQ_1740599768 Received: from mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.15]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-04.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 4C9B919030B0; Wed, 26 Feb 2025 19:56:08 +0000 (UTC) Received: from virtlab1023.lab.eng.rdu2.redhat.com (virtlab1023.lab.eng.rdu2.redhat.com [10.8.1.187]) by mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 723DB19560AE; Wed, 26 Feb 2025 19:56:07 +0000 (UTC) From: Paolo Bonzini To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: seanjc@google.com, Yan Zhao , Rick Edgecombe Subject: [PATCH 28/29] KVM: TDX: Skip updating CPU dirty logging request for TDs Date: Wed, 26 Feb 2025 14:55:28 -0500 Message-ID: <20250226195529.2314580-29-pbonzini@redhat.com> In-Reply-To: <20250226195529.2314580-1-pbonzini@redhat.com> References: <20250226195529.2314580-1-pbonzini@redhat.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.0 on 10.30.177.15 Wrap vmx_update_cpu_dirty_logging so as to ignore requests to update CPU dirty logging for TDs, as basic TDX does not support the PML feature. Invoking vmx_update_cpu_dirty_logging() for TDs would cause an incorrect access to a kvm_vmx struct for a TDX VM, so block that before it happens. Signed-off-by: Yan Zhao Signed-off-by: Paolo Bonzini --- arch/x86/kvm/vmx/main.c | 14 +++++++++++++- 1 file changed, 13 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c index ba3a23747bb1..ec8223ee9d28 100644 --- a/arch/x86/kvm/vmx/main.c +++ b/arch/x86/kvm/vmx/main.c @@ -129,6 +129,18 @@ static void vt_vcpu_load(struct kvm_vcpu *vcpu, int cpu) vmx_vcpu_load(vcpu, cpu); } +static void vt_update_cpu_dirty_logging(struct kvm_vcpu *vcpu) +{ + /* + * Basic TDX does not support feature PML. KVM does not enable PML in + * TD's VMCS, nor does it allocate or flush PML buffer for TDX. + */ + if (WARN_ON_ONCE(is_td_vcpu(vcpu))) + return; + + vmx_update_cpu_dirty_logging(vcpu); +} + static void vt_flush_tlb_all(struct kvm_vcpu *vcpu) { if (is_td_vcpu(vcpu)) { @@ -322,7 +334,7 @@ struct kvm_x86_ops vt_x86_ops __initdata = { .check_intercept = vmx_check_intercept, .handle_exit_irqoff = vmx_handle_exit_irqoff, - .update_cpu_dirty_logging = vmx_update_cpu_dirty_logging, + .update_cpu_dirty_logging = vt_update_cpu_dirty_logging, .nested_ops = &vmx_nested_ops, From patchwork Wed Feb 26 19:55:29 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Bonzini X-Patchwork-Id: 13993123 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6020D281379 for ; Wed, 26 Feb 2025 19:56:15 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740599777; cv=none; b=FkUF3OMRJxKj7ONidsDE4f9YVQyJjAuMt/IoFpNdtSMqHmiigMLcY3xxSa0tAvGCYVObwJ9ioyYgBzM/H+1bFJfuCRsLcR9+gY1vtwqxLhvtW3yJlY0wYjQ1ilgSl0skL2t3oINlTQV9ROJ3z6hpRsYD6U5MozaY30HCl7cmcgU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740599777; c=relaxed/simple; bh=NETAek9P6jFoRao+0v3DspO+//m8Dy5rDDzZs9yl4fk=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=jWIaSzFAkdU/7u4zy4TfPcPPLcr19+F4olaclQ8muEPdPrjxHz6LdCUNVgW6Evbc0Dg61NrFigk4xd0mfvoDGZ8OoDrWqEsovSY0EHSbSf2IAHgBfrARTHk5mOe08qCdNL46uxQwZi192tD+gtJtH+TUyB2U6HKAHLIylnUECFI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=HOYJgp8n; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="HOYJgp8n" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1740599774; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=9x3o+9ZxYY1IDIcV6p0SSKe8rNG8+39lVVVMoUGc4IQ=; b=HOYJgp8nFV9soylr8EiBERIiOiWQ/kADhzDTmF/tEZOf9dvZYXxNHKK6bpUsf6Yt4wSzsC +YdLrWXdsi8Byti8MWjUpH9ge9Q3uhybuZneuu7wEniTriqPdOiSYEzE99deohZS4WdbQ5 V7/Qdcz1rJlRqV7srbcUm8eeBUdq6VI= Received: from mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-644-1kRGc817PPy9WEX6XFbSrQ-1; Wed, 26 Feb 2025 14:56:12 -0500 X-MC-Unique: 1kRGc817PPy9WEX6XFbSrQ-1 X-Mimecast-MFC-AGG-ID: 1kRGc817PPy9WEX6XFbSrQ_1740599771 Received: from mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.15]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 5CCF31800873; Wed, 26 Feb 2025 19:56:09 +0000 (UTC) Received: from virtlab1023.lab.eng.rdu2.redhat.com (virtlab1023.lab.eng.rdu2.redhat.com [10.8.1.187]) by mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 8125C1955E79; Wed, 26 Feb 2025 19:56:08 +0000 (UTC) From: Paolo Bonzini To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: seanjc@google.com, Yan Zhao , Rick Edgecombe Subject: [PATCH 29/29] KVM: TDX: Handle SEPT zap error due to page add error in premap Date: Wed, 26 Feb 2025 14:55:29 -0500 Message-ID: <20250226195529.2314580-30-pbonzini@redhat.com> In-Reply-To: <20250226195529.2314580-1-pbonzini@redhat.com> References: <20250226195529.2314580-1-pbonzini@redhat.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.0 on 10.30.177.15 From: Yan Zhao Move the handling of SEPT zap errors caused by unsuccessful execution of tdh_mem_page_add() in KVM_TDX_INIT_MEM_REGION from tdx_sept_drop_private_spte() to tdx_sept_zap_private_spte(). Introduce a new helper function tdx_is_sept_zap_err_due_to_premap() to detect this specific error. During the IOCTL KVM_TDX_INIT_MEM_REGION, KVM premaps leaf SPTEs in the mirror page table before the corresponding entry in the private page table is successfully installed by tdh_mem_page_add(). If an error occurs during the invocation of tdh_mem_page_add(), a mismatch between the mirror and private page tables results in SEAMCALLs for SEPT zap returning the error code TDX_EPT_ENTRY_STATE_INCORRECT. The error TDX_EPT_WALK_FAILED is not possible because, during KVM_TDX_INIT_MEM_REGION, KVM only premaps leaf SPTEs after successfully mapping non-leaf SPTEs. Unlike leaf SPTEs, there is no mismatch in non-leaf PTEs between the mirror and private page tables. Therefore, during zap, SEAMCALLs should find an empty leaf entry in the private EPT, leading to the error TDX_EPT_ENTRY_STATE_INCORRECT instead of TDX_EPT_WALK_FAILED. Since tdh_mem_range_block() is always invoked before tdh_mem_page_remove(), move the handling of SEPT zap errors from tdx_sept_drop_private_spte() to tdx_sept_zap_private_spte(). In tdx_sept_zap_private_spte(), return 0 for errors due to premap to skip executing other SEAMCALLs for zap, which are unnecessary. Return 1 to indicate no other errors, allowing the execution of other zap SEAMCALLs to continue. The failure of tdh_mem_page_add() is uncommon and has not been observed in real workloads. Currently, this failure is only hypothetically triggered by skipping the real SEAMCALL and faking the add error in the SEAMCALL wrapper. Additionally, without this fix, there will be no host crashes or other severe issues. Signed-off-by: Yan Zhao Message-ID: <20250217085642.19696-1-yan.y.zhao@intel.com> Signed-off-by: Paolo Bonzini --- arch/x86/kvm/vmx/tdx.c | 66 ++++++++++++++++++++++++++++++------------ 1 file changed, 47 insertions(+), 19 deletions(-) diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index a6696448cb53..f2cbd8c440e9 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -775,20 +775,6 @@ static int tdx_sept_drop_private_spte(struct kvm *kvm, gfn_t gfn, &level_state); } while (unlikely(tdx_operand_busy(err))); - if (unlikely(kvm_tdx->state != TD_STATE_RUNNABLE && - err == (TDX_EPT_WALK_FAILED | TDX_OPERAND_ID_RCX))) { - /* - * Page is mapped by KVM_TDX_INIT_MEM_REGION, but hasn't called - * tdh_mem_page_add(). - */ - if ((!is_last_spte(entry, level) || !(entry & VMX_EPT_RWX_MASK)) && - !KVM_BUG_ON(!atomic64_read(&kvm_tdx->nr_premapped), kvm)) { - atomic64_dec(&kvm_tdx->nr_premapped); - tdx_unpin(kvm, page); - return 0; - } - } - if (KVM_BUG_ON(err, kvm)) { pr_tdx_error_2(TDH_MEM_PAGE_REMOVE, err, entry, level_state); return -EIO; @@ -826,8 +812,41 @@ int tdx_sept_link_private_spt(struct kvm *kvm, gfn_t gfn, return 0; } +/* + * Check if the error returned from a SEPT zap SEAMCALL is due to that a page is + * mapped by KVM_TDX_INIT_MEM_REGION without tdh_mem_page_add() being called + * successfully. + * + * Since tdh_mem_sept_add() must have been invoked successfully before a + * non-leaf entry present in the mirrored page table, the SEPT ZAP related + * SEAMCALLs should not encounter err TDX_EPT_WALK_FAILED. They should instead + * find TDX_EPT_ENTRY_STATE_INCORRECT due to an empty leaf entry found in the + * SEPT. + * + * Further check if the returned entry from SEPT walking is with RWX permissions + * to filter out anything unexpected. + * + * Note: @level is pg_level, not the tdx_level. The tdx_level extracted from + * level_state returned from a SEAMCALL error is the same as that passed into + * the SEAMCALL. + */ +static int tdx_is_sept_zap_err_due_to_premap(struct kvm_tdx *kvm_tdx, u64 err, + u64 entry, int level) +{ + if (!err || kvm_tdx->state == TD_STATE_RUNNABLE) + return false; + + if (err != (TDX_EPT_ENTRY_STATE_INCORRECT | TDX_OPERAND_ID_RCX)) + return false; + + if ((is_last_spte(entry, level) && (entry & VMX_EPT_RWX_MASK))) + return false; + + return true; +} + static int tdx_sept_zap_private_spte(struct kvm *kvm, gfn_t gfn, - enum pg_level level) + enum pg_level level, struct page *page) { int tdx_level = pg_level_to_tdx_sept_level(level); struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm); @@ -840,11 +859,19 @@ static int tdx_sept_zap_private_spte(struct kvm *kvm, gfn_t gfn, err = tdh_mem_range_block(&kvm_tdx->td, gpa, tdx_level, &entry, &level_state); if (unlikely(tdx_operand_busy(err))) return -EBUSY; + + if (tdx_is_sept_zap_err_due_to_premap(kvm_tdx, err, entry, level) && + !KVM_BUG_ON(!atomic64_read(&kvm_tdx->nr_premapped), kvm)) { + atomic64_dec(&kvm_tdx->nr_premapped); + tdx_unpin(kvm, page); + return 0; + } + if (KVM_BUG_ON(err, kvm)) { pr_tdx_error_2(TDH_MEM_RANGE_BLOCK, err, entry, level_state); return -EIO; } - return 0; + return 1; } /* @@ -918,6 +945,7 @@ int tdx_sept_free_private_spt(struct kvm *kvm, gfn_t gfn, int tdx_sept_remove_private_spte(struct kvm *kvm, gfn_t gfn, enum pg_level level, kvm_pfn_t pfn) { + struct page *page = pfn_to_page(pfn); int ret; /* @@ -928,8 +956,8 @@ int tdx_sept_remove_private_spte(struct kvm *kvm, gfn_t gfn, if (KVM_BUG_ON(!is_hkid_assigned(to_kvm_tdx(kvm)), kvm)) return -EINVAL; - ret = tdx_sept_zap_private_spte(kvm, gfn, level); - if (ret) + ret = tdx_sept_zap_private_spte(kvm, gfn, level, page); + if (ret <= 0) return ret; /* @@ -938,7 +966,7 @@ int tdx_sept_remove_private_spte(struct kvm *kvm, gfn_t gfn, */ tdx_track(kvm); - return tdx_sept_drop_private_spte(kvm, gfn, level, pfn_to_page(pfn)); + return tdx_sept_drop_private_spte(kvm, gfn, level, page); } static int tdx_get_capabilities(struct kvm_tdx_cmd *cmd)