From patchwork Wed Sep 4 03:07:49 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rick Edgecombe X-Patchwork-Id: 13789650 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9DA0759155; Wed, 4 Sep 2024 03:14:44 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.18 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725419686; cv=none; b=hJUoa63wSEelDWWCQnhZGmSw3yCIp1oZsv6hXmgsgHywA127GopD0ondb2llj+/D+naC6ZJo92XrRuI8nSFfmIkEMzkWBq/SXR/KM9bdciVcQ++tN+OJvLqYEgZdwiPN5HsBrAjjHIfMq7NzR+/KyKBbPlpAdoYwNRlhc0wEPfY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725419686; c=relaxed/simple; bh=nndpORrLQUk9Aw9kC1XdXsRMcRLXZY0XeYbkhI+fpss=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=j1Q5NB9CTB1K46MHG4icmuYqJEnFg8OvZBMID8PVb/fdfohoKQKs+xfYUmwQJ1KeEkrAr/ozTtkI992p42ApVGyRWosz5ZrQ/DAg6+yLWek7rfhOUTZNMhsUeeF78nnkzE4fwpKi+QoEcNWLcwVaOjj9z840yFrJOrSt74+roro= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=iPspYo3y; arc=none smtp.client-ip=192.198.163.18 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="iPspYo3y" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1725419684; x=1756955684; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=nndpORrLQUk9Aw9kC1XdXsRMcRLXZY0XeYbkhI+fpss=; b=iPspYo3yE1lTQ8fWs45pNfskf/QFMqfN3GtyKl3snQrXe8Va4E2pxco9 dOP/gQt1hoA3tR3gd41HWXELPJko63ofO6CG3hf/iEyKiwCiGmkdtjkqN QMWoxZq7i8qv6OBXh58eUYG+HpU65AjXRQ89kedRSS0U4B0PTQtabuXQI LBCGB4J3cLD3V0yBDcBJBehpyUYTc5Qnvx5ZwH4tZf1/cko7il4htWcZJ EqOxvBxzEjTlec+WhNtqmFNo6zOvpioI8ruWkuTqOCfVa6m3SubIeEonN 5gblLq9P/2VsMC0zhisoTr0FetdMfh/Z0e13WV/fgsAXRJbusXJDo9XPV Q==; X-CSE-ConnectionGUID: FfUnSgV1Rk66qyDQWF+dxw== X-CSE-MsgGUID: Fd0AdSxmTyqLXxE1N75Jhw== X-IronPort-AV: E=McAfee;i="6700,10204,11184"; a="23564726" X-IronPort-AV: E=Sophos;i="6.10,200,1719903600"; d="scan'208";a="23564726" Received: from orviesa009.jf.intel.com ([10.64.159.149]) by fmvoesa112.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Sep 2024 20:08:12 -0700 X-CSE-ConnectionGUID: 9X7FYh3AQ4m3KLeB7qTFfw== X-CSE-MsgGUID: dZJbLmi5QpOEJJuCCkqtew== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.10,200,1719903600"; d="scan'208";a="65106375" Received: from dgramcko-desk.amr.corp.intel.com (HELO rpedgeco-desk4..) ([10.124.221.153]) by orviesa009-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Sep 2024 20:08:12 -0700 From: Rick Edgecombe To: seanjc@google.com, pbonzini@redhat.com, kvm@vger.kernel.org Cc: kai.huang@intel.com, dmatlack@google.com, isaku.yamahata@gmail.com, yan.y.zhao@intel.com, nik.borisov@suse.com, rick.p.edgecombe@intel.com, linux-kernel@vger.kernel.org Subject: [PATCH 19/21] KVM: TDX: Add an ioctl to create initial guest memory Date: Tue, 3 Sep 2024 20:07:49 -0700 Message-Id: <20240904030751.117579-20-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240904030751.117579-1-rick.p.edgecombe@intel.com> References: <20240904030751.117579-1-rick.p.edgecombe@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Isaku Yamahata Add a new ioctl for the user space VMM to initialize guest memory with the specified memory contents. Because TDX protects the guest's memory, the creation of the initial guest memory requires a dedicated TDX module API, TDH.MEM.PAGE.ADD(), instead of directly copying the memory contents into the guest's memory in the case of the default VM type. Define a new subcommand, KVM_TDX_INIT_MEM_REGION, of vCPU-scoped KVM_MEMORY_ENCRYPT_OP. Check if the GFN is already pre-allocated, assign the guest page in Secure-EPT, copy the initial memory contents into the guest memory, and encrypt the guest memory. Optionally, extend the memory measurement of the TDX guest. Discussion history: - Originally, KVM_TDX_INIT_MEM_REGION used the callback of the TDP MMU of the KVM page fault handler. It issues TDX SEAMCALL deep in the call stack, and the ioctl passes down the necessary parameters. [2] rejected it. [3] suggests that the call to the TDX module should be invoked in a shallow call stack. - Instead, introduce guest memory pre-population [1] that doesn't update vendor-specific part (Secure-EPT in TDX case) and the vendor-specific code (KVM_TDX_INIT_MEM_REGION) updates only vendor-specific parts without modifying the KVM TDP MMU suggested at [4] Crazy idea. For TDX S-EPT, what if KVM_MAP_MEMORY does all of the SEPT.ADD stuff, which doesn't affect the measurement, and even fills in KVM's copy of the leaf EPTE, but tdx_sept_set_private_spte() doesn't do anything if the TD isn't finalized? Then KVM provides a dedicated TDX ioctl(), i.e. what is/was KVM_TDX_INIT_MEM_REGION, to do PAGE.ADD. KVM_TDX_INIT_MEM_REGION wouldn't need to map anything, it would simply need to verify that the pfn from guest_memfd() is the same as what's in the TDP MMU. - Use the common guest_memfd population function, kvm_gmem_populate() instead of a custom function. It should check whether the PFN from TDP MMU is the same as the one from guest_memfd. [1] - Instead of forcing userspace to do two passes, pre-map the guest initial memory in tdx_gmem_post_populate. [5] Link: https://lore.kernel.org/kvm/20240419085927.3648704-1-pbonzini@redhat.com/ [1] Link: https://lore.kernel.org/kvm/Zbrj5WKVgMsUFDtb@google.com/ [2] Link: https://lore.kernel.org/kvm/Zh8DHbb8FzoVErgX@google.com/ [3] Link: https://lore.kernel.org/kvm/Ze-TJh0BBOWm9spT@google.com/ [4] Link: https://lore.kernel.org/kvm/CABgObfa=a3cKcKJHQRrCs-3Ty8ppSRou=dhi6Q+KdZnom0Zegw@mail.gmail.com/ [5] Suggested-by: Sean Christopherson Signed-off-by: Isaku Yamahata Co-developed-by: Yan Zhao Signed-off-by: Yan Zhao Signed-off-by: Rick Edgecombe --- TDX MMU part 2 v1: - Update the code according to latest gmem update. https://lore.kernel.org/kvm/CABgObfa=a3cKcKJHQRrCs-3Ty8ppSRou=dhi6Q+KdZnom0Zegw@mail.gmail.com/ - Fixup a aligment bug reported by Binbin. - Rename KVM_MEMORY_MAPPING => KVM_MAP_MEMORY (Sean) - Drop issueing TDH.MEM.PAGE.ADD() on KVM_MAP_MEMORY(), defer it to KVM_TDX_INIT_MEM_REGION. (Sean) - Added nr_premapped to track the number of premapped pages - Drop tdx_post_mmu_map_page(). - Drop kvm_slot_can_be_private() check (Paolo) - Use kvm_tdp_mmu_gpa_is_mapped() (Paolo) v19: - Switched to use KVM_MEMORY_MAPPING - Dropped measurement extension - updated commit message. private_page_add() => set_private_spte() --- arch/x86/include/uapi/asm/kvm.h | 9 ++ arch/x86/kvm/vmx/tdx.c | 150 ++++++++++++++++++++++++++++++++ virt/kvm/kvm_main.c | 1 + 3 files changed, 160 insertions(+) diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h index 39636be5c891..789d1d821b4f 100644 --- a/arch/x86/include/uapi/asm/kvm.h +++ b/arch/x86/include/uapi/asm/kvm.h @@ -931,6 +931,7 @@ enum kvm_tdx_cmd_id { KVM_TDX_CAPABILITIES = 0, KVM_TDX_INIT_VM, KVM_TDX_INIT_VCPU, + KVM_TDX_INIT_MEM_REGION, KVM_TDX_GET_CPUID, KVM_TDX_CMD_NR_MAX, @@ -996,4 +997,12 @@ struct kvm_tdx_init_vm { struct kvm_cpuid2 cpuid; }; +#define KVM_TDX_MEASURE_MEMORY_REGION _BITULL(0) + +struct kvm_tdx_init_mem_region { + __u64 source_addr; + __u64 gpa; + __u64 nr_pages; +}; + #endif /* _ASM_X86_KVM_H */ diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index 50ce24905062..796d1a495a66 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -8,6 +8,7 @@ #include "tdx_ops.h" #include "vmx.h" #include "mmu/spte.h" +#include "common.h" #undef pr_fmt #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt @@ -1586,6 +1587,152 @@ static int tdx_vcpu_init(struct kvm_vcpu *vcpu, struct kvm_tdx_cmd *cmd) return 0; } +struct tdx_gmem_post_populate_arg { + struct kvm_vcpu *vcpu; + __u32 flags; +}; + +static int tdx_gmem_post_populate(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn, + void __user *src, int order, void *_arg) +{ + u64 error_code = PFERR_GUEST_FINAL_MASK | PFERR_PRIVATE_ACCESS; + struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm); + struct tdx_gmem_post_populate_arg *arg = _arg; + struct kvm_vcpu *vcpu = arg->vcpu; + gpa_t gpa = gfn_to_gpa(gfn); + u8 level = PG_LEVEL_4K; + struct page *page; + int ret, i; + u64 err, entry, level_state; + + /* + * Get the source page if it has been faulted in. Return failure if the + * source page has been swapped out or unmapped in primary memory. + */ + ret = get_user_pages_fast((unsigned long)src, 1, 0, &page); + if (ret < 0) + return ret; + if (ret != 1) + return -ENOMEM; + + if (!kvm_mem_is_private(kvm, gfn)) { + ret = -EFAULT; + goto out_put_page; + } + + ret = kvm_tdp_map_page(vcpu, gpa, error_code, &level); + if (ret < 0) + goto out_put_page; + + read_lock(&kvm->mmu_lock); + + if (!kvm_tdp_mmu_gpa_is_mapped(vcpu, gpa)) { + ret = -ENOENT; + goto out; + } + + ret = 0; + do { + err = tdh_mem_page_add(kvm_tdx, gpa, pfn_to_hpa(pfn), + pfn_to_hpa(page_to_pfn(page)), + &entry, &level_state); + } while (err == TDX_ERROR_SEPT_BUSY); + if (err) { + ret = -EIO; + goto out; + } + + WARN_ON_ONCE(!atomic64_read(&kvm_tdx->nr_premapped)); + atomic64_dec(&kvm_tdx->nr_premapped); + + if (arg->flags & KVM_TDX_MEASURE_MEMORY_REGION) { + for (i = 0; i < PAGE_SIZE; i += TDX_EXTENDMR_CHUNKSIZE) { + err = tdh_mr_extend(kvm_tdx, gpa + i, &entry, + &level_state); + if (err) { + ret = -EIO; + break; + } + } + } + +out: + read_unlock(&kvm->mmu_lock); +out_put_page: + put_page(page); + return ret; +} + +static int tdx_vcpu_init_mem_region(struct kvm_vcpu *vcpu, struct kvm_tdx_cmd *cmd) +{ + struct kvm *kvm = vcpu->kvm; + struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm); + struct kvm_tdx_init_mem_region region; + struct tdx_gmem_post_populate_arg arg; + long gmem_ret; + int ret; + + if (!to_tdx(vcpu)->initialized) + return -EINVAL; + + /* Once TD is finalized, the initial guest memory is fixed. */ + if (is_td_finalized(kvm_tdx)) + return -EINVAL; + + if (cmd->flags & ~KVM_TDX_MEASURE_MEMORY_REGION) + return -EINVAL; + + if (copy_from_user(®ion, u64_to_user_ptr(cmd->data), sizeof(region))) + return -EFAULT; + + if (!PAGE_ALIGNED(region.source_addr) || !PAGE_ALIGNED(region.gpa) || + !region.nr_pages || + region.gpa + (region.nr_pages << PAGE_SHIFT) <= region.gpa || + !kvm_is_private_gpa(kvm, region.gpa) || + !kvm_is_private_gpa(kvm, region.gpa + (region.nr_pages << PAGE_SHIFT) - 1)) + return -EINVAL; + + mutex_lock(&kvm->slots_lock); + + kvm_mmu_reload(vcpu); + ret = 0; + while (region.nr_pages) { + if (signal_pending(current)) { + ret = -EINTR; + break; + } + + arg = (struct tdx_gmem_post_populate_arg) { + .vcpu = vcpu, + .flags = cmd->flags, + }; + gmem_ret = kvm_gmem_populate(kvm, gpa_to_gfn(region.gpa), + u64_to_user_ptr(region.source_addr), + 1, tdx_gmem_post_populate, &arg); + if (gmem_ret < 0) { + ret = gmem_ret; + break; + } + + if (gmem_ret != 1) { + ret = -EIO; + break; + } + + region.source_addr += PAGE_SIZE; + region.gpa += PAGE_SIZE; + region.nr_pages--; + + cond_resched(); + } + + mutex_unlock(&kvm->slots_lock); + + if (copy_to_user(u64_to_user_ptr(cmd->data), ®ion, sizeof(region))) + ret = -EFAULT; + return ret; +} + int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp) { struct kvm_tdx *kvm_tdx = to_kvm_tdx(vcpu->kvm); @@ -1605,6 +1752,9 @@ int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp) case KVM_TDX_INIT_VCPU: ret = tdx_vcpu_init(vcpu, &cmd); break; + case KVM_TDX_INIT_MEM_REGION: + ret = tdx_vcpu_init_mem_region(vcpu, &cmd); + break; case KVM_TDX_GET_CPUID: ret = tdx_vcpu_get_cpuid(vcpu, &cmd); break; diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 73fc3334721d..0822db480719 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -2639,6 +2639,7 @@ struct kvm_memory_slot *kvm_vcpu_gfn_to_memslot(struct kvm_vcpu *vcpu, gfn_t gfn return NULL; } +EXPORT_SYMBOL_GPL(kvm_vcpu_gfn_to_memslot); bool kvm_is_visible_gfn(struct kvm *kvm, gfn_t gfn) {