From patchwork Tue Jul 18 23:44:57 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sean Christopherson X-Patchwork-Id: 13317883 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 944D1EB64DD for ; Tue, 18 Jul 2023 23:49:04 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 45A898D002C; Tue, 18 Jul 2023 19:49:02 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 409D68D0012; Tue, 18 Jul 2023 19:49:02 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2841D8D002C; Tue, 18 Jul 2023 19:49:02 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 19B098D0012 for ; Tue, 18 Jul 2023 19:49:02 -0400 (EDT) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id E747DB0FE1 for ; Tue, 18 Jul 2023 23:49:01 +0000 (UTC) X-FDA: 81026375682.23.821D15F Received: from mail-pl1-f202.google.com (mail-pl1-f202.google.com [209.85.214.202]) by imf12.hostedemail.com (Postfix) with ESMTP id 10C3840017 for ; Tue, 18 Jul 2023 23:48:59 +0000 (UTC) Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=E5XUQ5BO; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf12.hostedemail.com: domain of 36iS3ZAYKCD0rdZmibfnnfkd.bnlkhmtw-lljuZbj.nqf@flex--seanjc.bounces.google.com designates 209.85.214.202 as permitted sender) smtp.mailfrom=36iS3ZAYKCD0rdZmibfnnfkd.bnlkhmtw-lljuZbj.nqf@flex--seanjc.bounces.google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1689724140; h=from:from:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=yr5kPp//ycHeBe41TiVxkGxE+I0FtPuZavos26l+cCE=; b=7Ja5Hth+2Mrw9481spq3hnpL7nHKuxW2ImepoE3j9q6XHiJZ1YMij43eCKMondMP/H+gsC SXMG4EKITnWkzPbspF79F9ngHkLBHd+8qeBz9/PonI2wYDHS14FYGYDTcIcRRxuLuN27Lb kXLVqIgCE+TRiYnny5M/oqNJvHH4Fj8= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=E5XUQ5BO; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf12.hostedemail.com: domain of 36iS3ZAYKCD0rdZmibfnnfkd.bnlkhmtw-lljuZbj.nqf@flex--seanjc.bounces.google.com designates 209.85.214.202 as permitted sender) smtp.mailfrom=36iS3ZAYKCD0rdZmibfnnfkd.bnlkhmtw-lljuZbj.nqf@flex--seanjc.bounces.google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1689724140; a=rsa-sha256; cv=none; b=Hx0Z8eySBRJfH+Vt4Hy4PJ895JAc1oHrdqAJwOBQBVHrPCNlT56C8XsglrvN/FI1AM7UXe ajzzOMw2JEIYuIVYv6rgnBUVMByuMbBuXmsLxhlIPwIfXefrZJNJY6P6HB2AMD+d6ujdFr klYMEmFLkbJUizb/q0nQmDP38rTWhus= Received: by mail-pl1-f202.google.com with SMTP id d9443c01a7336-1b8a4571c1aso33127005ad.0 for ; Tue, 18 Jul 2023 16:48:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1689724139; x=1692316139; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=yr5kPp//ycHeBe41TiVxkGxE+I0FtPuZavos26l+cCE=; b=E5XUQ5BOUPih8BJs1OU21KMtQ0H2gaSku+U7buhnuKlRnySGwZ68OBKlxRT9sWT5mJ f8n+BR26cm0lRm1JXX78qwSra7nbrQ6sXHvLQZaZDcIFccPOjvvZmZeAzcuh7O2Q/Qjq eAJ0ZmDH3NG1CIo5jmkXJBD/J0qhHBq3KrWoOEZ65tybA//6e/m3I0cImw/wdA9FJDS3 AlGOTWITqYnobVa5ljRgBqTN9AP0pMpMc6YQF9quW057Wh43QQOVDM46RFAPH9PRZwwq CZoc5KVa79eYnqhOrpC35DVzpFIb/w5YuQeYDKyW3j9UeKVBLo58qtN5wBD28kBBLbq2 wpNw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689724139; x=1692316139; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=yr5kPp//ycHeBe41TiVxkGxE+I0FtPuZavos26l+cCE=; b=WjAJy8nH2nIQfwsui0qIUg0Gc0PBzXmyKuvLIOzDhyvfzNqmhqLhupx0VGepYOTq48 pWQAE5IwNAdfSdeuZn0TRsoS9Uukfn1cXSvShdj+/hkQaIv39Tw/AEAyYXRIUJgrE6r3 I2o8YQFwTI1kEnGxvQuEBT2YmtKhNOQ3c5A6U3Ntfg54DE1DBSl0Q1ODyiafd3VKZsm4 MWQ+oYipsaFqOHS3I8M3NdO+QDPRMz/WJh3V1IbdkyO33rv5jlVJFGNz6UGmdYDupfjW Xq0hj+i0EgBQjWef5VAtshpGWSkgz1hHN2zQaqL7RZmlWhsnnHLlGzmldtKcPArIbU8w ZLpw== X-Gm-Message-State: ABy/qLYZL+PH18uzaB9VcQaN0VFTkES8oDQRP2tgcPVp8HLPKrYIkUbH 3ZPjOw3BLCI8bMU3LQzapEZvAEz1vUY= X-Google-Smtp-Source: APBJJlEVWl5JDRlnPOqWQbyjGWHyKDzKVGtobf5+lC7u6yLDp0aN/2aXMEnW9jTxfL1DAcNC6lZfVqqUTaM= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a17:902:ea01:b0:1b8:a56e:1dcc with SMTP id s1-20020a170902ea0100b001b8a56e1dccmr7653plg.13.1689724138849; Tue, 18 Jul 2023 16:48:58 -0700 (PDT) Reply-To: Sean Christopherson Date: Tue, 18 Jul 2023 16:44:57 -0700 In-Reply-To: <20230718234512.1690985-1-seanjc@google.com> Mime-Version: 1.0 References: <20230718234512.1690985-1-seanjc@google.com> X-Mailer: git-send-email 2.41.0.255.g8b1d071c50-goog Message-ID: <20230718234512.1690985-15-seanjc@google.com> Subject: [RFC PATCH v11 14/29] KVM: x86/mmu: Handle page fault for private memory From: Sean Christopherson To: Paolo Bonzini , Marc Zyngier , Oliver Upton , Huacai Chen , Michael Ellerman , Anup Patel , Paul Walmsley , Palmer Dabbelt , Albert Ou , Sean Christopherson , "Matthew Wilcox (Oracle)" , Andrew Morton , Paul Moore , James Morris , "Serge E. Hallyn" Cc: kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, linux-mips@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, kvm-riscv@lists.infradead.org, linux-riscv@lists.infradead.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-security-module@vger.kernel.org, linux-kernel@vger.kernel.org, Chao Peng , Fuad Tabba , Jarkko Sakkinen , Yu Zhang , Vishal Annapurve , Ackerley Tng , Maciej Szmigiero , Vlastimil Babka , David Hildenbrand , Quentin Perret , Michael Roth , Wang , Liam Merwick , Isaku Yamahata , "Kirill A . Shutemov" X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 10C3840017 X-Stat-Signature: 5q59n7ne8eib987y9w33jrwmmwko3ewe X-Rspam-User: X-HE-Tag: 1689724139-464977 X-HE-Meta: U2FsdGVkX1+aTohSyz29T+g+CDEphNlubhbtQxve6mqD9IYSsocI20jvhydpr4xqyG9OhPWC/usVbqVdau0I8JxMDGBGIwq0JzAAy2fq3/4FGOcaFoE34JLlwXLbzvKTu0dwbavK+qfO+rIH63+sWZu4AJ3FWtsWSdUIbK25BJb0KOA9iEMMpWzDQlttymWqoaHX1Ef2isVIrWWHdMeTqaisAz7ucSGM8ZxTNKpREB5F2YVh1/FJWDpP62dMvhb4qCkRLbNt/a5OcJk8kPcNU1Uw+WFMcqFb9L4Y6Oek7m482ebcpW9LqsjIYfQt5G4GV4bwrukeNKY62Ong9HT6uxnT+HeMSzvUYD7ztvBdsGNBrJMTBVZsMKFGL8ZDKsIYFNMV8PWNQ5kDTJlIiu+JFNvYc+j0Z1f5uBkun1c8WLI2MDXBrmVhAHdX4ICzoDnMOpgc/A2Yu8aCdQ8iyjiydlvvRl7bOkA78JyKQknJpnRkCxCYpgOqOk7iwvg3qCAwUdKB14UHxtSUxXrfR3tKs53rHvAmUlMuGNedzKEW4rWOyNuW53d//A2xlpNa2GNW5JZERyGIDbI0W/HRFVKT+o//+4y5hNmOgtayQYo1AWPvHtzq2e2mNSxcaE4wp1MibfqRBC9BgnwBlseeh9RQ7wSyTkWXNDNIoWvWnsVL16VhTZ9Vk4osaySpQO4dPDJgplX82QJGcBGAS8hCHvkvBS/77XCk7imDX3P+sIOMd4iutM5ojFlpp88FjeeUoLneHBe9uCOs8r5idqPCeWq3Hj9OGw5GN/N/18nA+SvqwOtP6CkZO+SzOzG+mXS1cY+Mk8+Z8HsGWzqn+TTS/Y7U/U99hU9Sl5spX2OQlEV2ZeJA+wmOjJcuL0l3WMWylR/YKRlZ9dFKuZPNbzMdfhVMIS4NncJP/GqTJ0PCaEWOHijCYkQgP0fDEd3cFAxj3UB09IfGy7n3P1Y4mnEtm8a pw3cWvYZ beEuS889IvBH9KmnMT/x+8UA+3Mc5PzUuLYrLMaKHUiYZhnR8vaWFhDBxaClLm7bBEPpSNBycU7n8P9EIjTva1Mae3E38RGPbsAITJUAhp4ac7F4wyqad3rCq96AJPfWBSm/yNleilRW8s912BAQ468S03swRjdtPKlBm+AAR0rDpluO1Xjtfl5dzU20Ba8x97JT2GXlNl1WEAtRRADFb75DzIMtOekSio9gadKGuJdE6E0RdW6Be2olm6DIQ16dvj4SEJkD+iQZfdi4a8WQV+E1icXX9KMID5yKIZQNo65rpq+DDQwSuzjbf4bdDhKxr3Z8KqqA3SPUZPPDMMWAqgVgdbJl/HjSoNnCVf0xQUBx7RWi5MAhGdquUoLJmS2SeVHmPiM+3RCK3gBJgzBJx8xuOZpavuyOX5ivPT4DLYIPrLpdTZHNal8bKsneBmrDBmURGxlPv7FAmZSG7E1QifRSAK6D+wkPZMn97j5Dc8M959EAEeaVHxpA05HolSZ37gQindOXHYbZw8dHGR02W2FYcKHwq7x+T+coNFPT0vTmBsPcUdg3wcaR3LnNxemrcXij+lNnF9KUtAeBK9Ov3jZWS+c5jazIjeOsnxfZ3vZgd6yuzom8mBV7tveZ/A+J/hfsrB1POHi7YGarAgs13+w/b1yigsqZsv21yb0kOdOy/KDC12EwyAFyYySALzPSuPdPoKv87g3Wie9Z31ncMvklHHyN8qnnIB+xmrsTe9T5i2bZk1zQlefmxIeEmEcYDO6M4bCofxC2y3VfZI1AHA34bqPfNsQtEpVA15RKMGPcYCiY= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Chao Peng A KVM_MEM_PRIVATE memslot can include both fd-based private memory and hva-based shared memory. Architecture code (like TDX code) can tell whether the on-going fault is private or not. This patch adds a 'is_private' field to kvm_page_fault to indicate this and architecture code is expected to set it. To handle page fault for such memslot, the handling logic is different depending on whether the fault is private or shared. KVM checks if 'is_private' matches the host's view of the page (maintained in mem_attr_array). - For a successful match, private pfn is obtained with restrictedmem_get_page() and shared pfn is obtained with existing get_user_pages(). - For a failed match, KVM causes a KVM_EXIT_MEMORY_FAULT exit to userspace. Userspace then can convert memory between private/shared in host's view and retry the fault. Co-developed-by: Yu Zhang Signed-off-by: Yu Zhang Signed-off-by: Chao Peng Reviewed-by: Fuad Tabba Tested-by: Fuad Tabba Signed-off-by: Sean Christopherson Reviewed-by: Paolo Bonzini --- arch/x86/kvm/mmu/mmu.c | 82 +++++++++++++++++++++++++++++++-- arch/x86/kvm/mmu/mmu_internal.h | 3 ++ arch/x86/kvm/mmu/mmutrace.h | 1 + 3 files changed, 81 insertions(+), 5 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index aefe67185637..4cf73a579ee1 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -3179,9 +3179,9 @@ static int host_pfn_mapping_level(struct kvm *kvm, gfn_t gfn, return level; } -int kvm_mmu_max_mapping_level(struct kvm *kvm, - const struct kvm_memory_slot *slot, gfn_t gfn, - int max_level) +static int __kvm_mmu_max_mapping_level(struct kvm *kvm, + const struct kvm_memory_slot *slot, + gfn_t gfn, int max_level, bool is_private) { struct kvm_lpage_info *linfo; int host_level; @@ -3193,6 +3193,9 @@ int kvm_mmu_max_mapping_level(struct kvm *kvm, break; } + if (is_private) + return max_level; + if (max_level == PG_LEVEL_4K) return PG_LEVEL_4K; @@ -3200,6 +3203,16 @@ int kvm_mmu_max_mapping_level(struct kvm *kvm, return min(host_level, max_level); } +int kvm_mmu_max_mapping_level(struct kvm *kvm, + const struct kvm_memory_slot *slot, gfn_t gfn, + int max_level) +{ + bool is_private = kvm_slot_can_be_private(slot) && + kvm_mem_is_private(kvm, gfn); + + return __kvm_mmu_max_mapping_level(kvm, slot, gfn, max_level, is_private); +} + void kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) { struct kvm_memory_slot *slot = fault->slot; @@ -3220,8 +3233,9 @@ void kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault * Enforce the iTLB multihit workaround after capturing the requested * level, which will be used to do precise, accurate accounting. */ - fault->req_level = kvm_mmu_max_mapping_level(vcpu->kvm, slot, - fault->gfn, fault->max_level); + fault->req_level = __kvm_mmu_max_mapping_level(vcpu->kvm, slot, + fault->gfn, fault->max_level, + fault->is_private); if (fault->req_level == PG_LEVEL_4K || fault->huge_page_disallowed) return; @@ -4304,6 +4318,55 @@ void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu, struct kvm_async_pf *work) kvm_mmu_do_page_fault(vcpu, work->cr2_or_gpa, 0, true, NULL); } +static inline u8 kvm_max_level_for_order(int order) +{ + BUILD_BUG_ON(KVM_MAX_HUGEPAGE_LEVEL > PG_LEVEL_1G); + + MMU_WARN_ON(order != KVM_HPAGE_GFN_SHIFT(PG_LEVEL_1G) && + order != KVM_HPAGE_GFN_SHIFT(PG_LEVEL_2M) && + order != KVM_HPAGE_GFN_SHIFT(PG_LEVEL_4K)); + + if (order >= KVM_HPAGE_GFN_SHIFT(PG_LEVEL_1G)) + return PG_LEVEL_1G; + + if (order >= KVM_HPAGE_GFN_SHIFT(PG_LEVEL_2M)) + return PG_LEVEL_2M; + + return PG_LEVEL_4K; +} + +static int kvm_do_memory_fault_exit(struct kvm_vcpu *vcpu, + struct kvm_page_fault *fault) +{ + vcpu->run->exit_reason = KVM_EXIT_MEMORY_FAULT; + if (fault->is_private) + vcpu->run->memory.flags = KVM_MEMORY_EXIT_FLAG_PRIVATE; + else + vcpu->run->memory.flags = 0; + vcpu->run->memory.gpa = fault->gfn << PAGE_SHIFT; + vcpu->run->memory.size = PAGE_SIZE; + return RET_PF_USER; +} + +static int kvm_faultin_pfn_private(struct kvm_vcpu *vcpu, + struct kvm_page_fault *fault) +{ + int max_order, r; + + if (!kvm_slot_can_be_private(fault->slot)) + return kvm_do_memory_fault_exit(vcpu, fault); + + r = kvm_gmem_get_pfn(vcpu->kvm, fault->slot, fault->gfn, &fault->pfn, + &max_order); + if (r) + return r; + + fault->max_level = min(kvm_max_level_for_order(max_order), + fault->max_level); + fault->map_writable = !(fault->slot->flags & KVM_MEM_READONLY); + return RET_PF_CONTINUE; +} + static int __kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) { struct kvm_memory_slot *slot = fault->slot; @@ -4336,6 +4399,12 @@ static int __kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault return RET_PF_EMULATE; } + if (fault->is_private != kvm_mem_is_private(vcpu->kvm, fault->gfn)) + return kvm_do_memory_fault_exit(vcpu, fault); + + if (fault->is_private) + return kvm_faultin_pfn_private(vcpu, fault); + async = false; fault->pfn = __gfn_to_pfn_memslot(slot, fault->gfn, false, false, &async, fault->write, &fault->map_writable, @@ -5771,6 +5840,9 @@ int noinline kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 err return -EIO; } + if (r == RET_PF_USER) + return 0; + if (r < 0) return r; if (r != RET_PF_EMULATE) diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h index d39af5639ce9..268b517e88cb 100644 --- a/arch/x86/kvm/mmu/mmu_internal.h +++ b/arch/x86/kvm/mmu/mmu_internal.h @@ -203,6 +203,7 @@ struct kvm_page_fault { /* Derived from mmu and global state. */ const bool is_tdp; + const bool is_private; const bool nx_huge_page_workaround_enabled; /* @@ -259,6 +260,7 @@ int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault); * RET_PF_RETRY: let CPU fault again on the address. * RET_PF_EMULATE: mmio page fault, emulate the instruction directly. * RET_PF_INVALID: the spte is invalid, let the real page fault path update it. + * RET_PF_USER: need to exit to userspace to handle this fault. * RET_PF_FIXED: The faulting entry has been fixed. * RET_PF_SPURIOUS: The faulting entry was already fixed, e.g. by another vCPU. * @@ -275,6 +277,7 @@ enum { RET_PF_RETRY, RET_PF_EMULATE, RET_PF_INVALID, + RET_PF_USER, RET_PF_FIXED, RET_PF_SPURIOUS, }; diff --git a/arch/x86/kvm/mmu/mmutrace.h b/arch/x86/kvm/mmu/mmutrace.h index ae86820cef69..2d7555381955 100644 --- a/arch/x86/kvm/mmu/mmutrace.h +++ b/arch/x86/kvm/mmu/mmutrace.h @@ -58,6 +58,7 @@ TRACE_DEFINE_ENUM(RET_PF_CONTINUE); TRACE_DEFINE_ENUM(RET_PF_RETRY); TRACE_DEFINE_ENUM(RET_PF_EMULATE); TRACE_DEFINE_ENUM(RET_PF_INVALID); +TRACE_DEFINE_ENUM(RET_PF_USER); TRACE_DEFINE_ENUM(RET_PF_FIXED); TRACE_DEFINE_ENUM(RET_PF_SPURIOUS);