From patchwork Thu Sep 14 01:55:16 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sean Christopherson X-Patchwork-Id: 13384025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 988A4EE021C for ; Thu, 14 Sep 2023 01:56:27 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A13D56B02B2; Wed, 13 Sep 2023 21:56:14 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 9C0646B02B3; Wed, 13 Sep 2023 21:56:14 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7EF6B6B02B4; Wed, 13 Sep 2023 21:56:14 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 69C2D6B02B2 for ; Wed, 13 Sep 2023 21:56:14 -0400 (EDT) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 418691CABF9 for ; Thu, 14 Sep 2023 01:56:14 +0000 (UTC) X-FDA: 81233537868.16.D53BB35 Received: from mail-yb1-f201.google.com (mail-yb1-f201.google.com [209.85.219.201]) by imf25.hostedemail.com (Postfix) with ESMTP id 68179A0003 for ; Thu, 14 Sep 2023 01:56:12 +0000 (UTC) Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=jvd73Hw5; spf=pass (imf25.hostedemail.com: domain of 3O2gCZQYKCEQykgtpimuumrk.iusrot03-ssq1giq.uxm@flex--seanjc.bounces.google.com designates 209.85.219.201 as permitted sender) smtp.mailfrom=3O2gCZQYKCEQykgtpimuumrk.iusrot03-ssq1giq.uxm@flex--seanjc.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1694656572; h=from:from:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=tHJ1oIsuZhhB0EvSSZcC1bEv2oeM3xd3VRRUpH1LT1s=; b=4YDVVlNymw02+ahfrZ/QhAFwqYs06NiPaGs1NhUjjJ5Kzm37J4qMc23QwKCHATf0wxbD9F nhAA4SFpx5HCwowpCn6iOS/HF7GA6GJSO48+IgVzNVXaDsol0xrH5mIREOE6H1z38ukgDA GKAomP4bGVb9LHs8ix8Ed9mEE7F4RvQ= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1694656572; a=rsa-sha256; cv=none; b=5HVQKraDTqOT3+bP9zrz9VGcKYksFRScTHdbtBCt1o2pqb84Mtrd+P3QV0I+8cvHGmVt8M ZYdtljxBodluBA+2Y3Pkv2sBoFEdBpGC+LVbCIjf8YUoI9LnbElU1CQhFSe0gbc3a7NFOI 5iQBhnlS9rHNrCcbqIFoZRaxOyodPPc= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=jvd73Hw5; spf=pass (imf25.hostedemail.com: domain of 3O2gCZQYKCEQykgtpimuumrk.iusrot03-ssq1giq.uxm@flex--seanjc.bounces.google.com designates 209.85.219.201 as permitted sender) smtp.mailfrom=3O2gCZQYKCEQykgtpimuumrk.iusrot03-ssq1giq.uxm@flex--seanjc.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-yb1-f201.google.com with SMTP id 3f1490d57ef6-d81646fcf3eso627830276.0 for ; Wed, 13 Sep 2023 18:56:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1694656571; x=1695261371; darn=kvack.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=tHJ1oIsuZhhB0EvSSZcC1bEv2oeM3xd3VRRUpH1LT1s=; b=jvd73Hw5osWwylukpEP/Yl7cTwQg883+g6HBhnxuWvR7Y3psVobifQedPgrPXLJBLp zJO0CoD/0wLEeNHozNag2fQu8I+UOP27cjFAfiCQPKpPGDQ7QqQ9Wo0EfUliINmkUDSp c2TOumOqDHsp1rQZXcHM5J1BSFRp/RVE3bFOMIIQrF/fwQmc6uaKt6hKBU/n0mkMQdO5 eXKJzjxCxBnZmyi+l0aXP+k8GB7I1s6Y++7i3yTDkte+VjhHe4iM2p7NC4QBhU02Y1L1 Md7spNJwI7nMzyk27WpHFl4XhMNBxEYNK4G/BXT1Wn06jmJIDFyZJlvPwUM8+zS0ANKM LFAw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1694656571; x=1695261371; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=tHJ1oIsuZhhB0EvSSZcC1bEv2oeM3xd3VRRUpH1LT1s=; b=MxX6oGM1oQsnSg+aWXZVE/RMHtPLJd2lKQfZrVxYeiFl/VyBvA9JAexVmvLOAELDoH vdZyHWlHQDtAPF0SkcGdhI+iynMC9W36uDqRsl59DgRUGzORBo/KbiZkVfBJLm9/v852 0tkaYzKFHQQPuxXrcUsZvE0ds2XDgyJoyqBZ3y8Yeo3VENEvk1laG7DK3sb4cp5qqv0v nKi0kzBbRvmkssynFyIMBzINhMVYc15c5Mz8H43cqlLgnBIB8AUBI014K7f+gobaM+pF Uy56A70JBVgtxx+od3gUOaYP74AslxVGzEKAaa3PhDha6LPinxZP+4FwSEe18DvM3tzB KFMg== X-Gm-Message-State: AOJu0YyKR0Z6Cr9P4IOrAvuAqaJm3PF3LrgQ8IH5EBh8WcKxbYqQ+u2T sb9Js+1bchk5iqOZ6n9aa1/H0P4f+7M= X-Google-Smtp-Source: AGHT+IFyAB8K8OP9oG3Uk6mt7LdYos/kG/8NwWox73fG712J3+qVdxKDIkOEp8/f3m964K84JR/iRTKItFY= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a25:aa83:0:b0:d77:984e:c770 with SMTP id t3-20020a25aa83000000b00d77984ec770mr96102ybi.5.1694656571602; Wed, 13 Sep 2023 18:56:11 -0700 (PDT) Reply-To: Sean Christopherson Date: Wed, 13 Sep 2023 18:55:16 -0700 In-Reply-To: <20230914015531.1419405-1-seanjc@google.com> Mime-Version: 1.0 References: <20230914015531.1419405-1-seanjc@google.com> X-Mailer: git-send-email 2.42.0.283.g2d96d420d3-goog Message-ID: <20230914015531.1419405-19-seanjc@google.com> Subject: [RFC PATCH v12 18/33] KVM: x86/mmu: Handle page fault for private memory From: Sean Christopherson To: Paolo Bonzini , Marc Zyngier , Oliver Upton , Huacai Chen , Michael Ellerman , Anup Patel , Paul Walmsley , Palmer Dabbelt , Albert Ou , Sean Christopherson , "Matthew Wilcox (Oracle)" , Andrew Morton , Paul Moore , James Morris , "Serge E. Hallyn" Cc: kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, linux-mips@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, kvm-riscv@lists.infradead.org, linux-riscv@lists.infradead.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-security-module@vger.kernel.org, linux-kernel@vger.kernel.org, Chao Peng , Fuad Tabba , Jarkko Sakkinen , Anish Moorthy , Yu Zhang , Isaku Yamahata , Xu Yilun , Vlastimil Babka , Vishal Annapurve , Ackerley Tng , Maciej Szmigiero , David Hildenbrand , Quentin Perret , Michael Roth , Wang , Liam Merwick , Isaku Yamahata , "Kirill A . Shutemov" X-Rspamd-Queue-Id: 68179A0003 X-Rspam-User: X-Rspamd-Server: rspam11 X-Stat-Signature: djef865nmrpfg3edahbyb3wpuoc65h9b X-HE-Tag: 1694656572-653034 X-HE-Meta: U2FsdGVkX1/x4cifERsSC8TL5do4O3JYF1+IufkGkEO48SzZGgtCbaB67zNYeIQO9OK9JmLyZLCLsaQJOF1hn9Q/sRl7HOmv8gV98sIlPW8G7b8LqWHEukvlzNCJBSeRSlJW0kjWf+syNd+SxQ4VIthNuz2hXcjwGbyxOhMlDv8CpUdVf0FMy1v7KULE3TR0BWEV1ptUWeT6pYQVFYa2NdhVIDlI4xObXtHxVoI+Un3M5AiqR0W7t2ly/F1G5FGblS0fTGP0inOQ8xTmcs5GAykhky/P7ruzdIq92eBkUl6vKdR3F83qDV/m7PcdVX8Mtczx5JNi8cEX+2vtcZxgPY+2Ulybsw5h/Q8Lrxtl9FZaipuOu4oxgc+gGdFbaKbJcPzfeQ4dqKZ0ZqzpL13Wj+vUrch30epREvIKNsqOlDgFMvwQff4H07tlvnxz/690ieN8WznIVLHUHxeWFsysAoWlYU4+DBKFeSk9obGlV4WPJvGB6daRjSPeO6WjIBS7j6/MB0IKSNZMnwlH7qjJivMUmB66jjNN/XNjtFXHuD+1qMse7vIlPdJueBvzA5AwB4wTupD4k+P5PWKC1/2lBJxFvnob6JAX1ixXoSG6Pw9fP/AHQmExsleMz79BcFRt99Lwjtnh/+Gr85Ksv9qyZB0tqkPnhUPI3n9Z7G2Ke8gZ3iJ5hSJME5vmP98QERo1VeOHf+KIHMDOqUoC+NreVkT2YV8rmpYg5s4MJlyWu9192QihEXd3B8nDqYdY4RsTt8R+JGv48K2rTMphhkcEFc7e3yX3sRtq5MxrgfrxvjXg+ly5er/0oTeuhSvTK/qJgCJ8g9s/vFye5itM2e1kgp/ngiV9p8uQhB7bUa87aXMo7qlbY0HLSCBF5d/mB/S2tCACld0O0NyUAxWiwOoEG4CAtHr5mSvlx8OR+JHARjoSD7Ti1mLyn2/atMNW8N0Ls/oL1LT923qdyEUvg3p vW2Y7xEX OgfG7gPnysuCkVemTEfnYq9+HxJke9OWBZ1TbOnurTeJUQ5F2cDGNG6bWr0dHYkJmpQx7CPujlSaEcggDgoDKUgf3NAbVgsFK9BT7c8GusbZ843TGUiYrplnB6XiPmlxTZTDuGwr46viLYM6TEa30Twiz9DocsAVzCR7LPpwdokec3mhaxoo2yHp9EotiqJrcYTUGXvA+bK/DhuXLQI0Nl+FniChLgyvvLqE5FhviPPNB7CF7GMN32/flNPS4hd5aFlGmDyRMbcCrIw4wis9OXzvJA6rUvIZ0LPBaY6G0XzWerVq4bPpuJ3mkoOdK79jBK70any1nhPQE79onNXsOmDvvTbUJ/DT1ieJuZhhhi+gOd51Wt3G9x98X2fFmSax4DW3kbpapF/X/5x3E6WJgiPPkJVwj9mcZoePFliFricvZu6VjBpPKg/+YNiZlNyqiT51Q4b/Zuq6BBv+Nfnebm5fKi82pfvBZLAHTnEcbopHBncI9vgNOmIaYN4Se1AjcD3JiVucKJ8SRvxw7wtt0dE77ZEjpkiCQ4JjwnYUyDrLAgDO2w9aHwDNULdfbPYNpk7EAk3HHSkVSkh3T+c91RAVYQQfMJJEMn7uXEj4HvfsAeawAxWmaRl2jS7Ae0Gmj7qQkEmzB41cwWx2EDTfM5zFsKNDoZMXQVkiMSYaoy6DoFv8qgr39eLx+87WaVaY41qtZblVc2qv206xCmfc7plAZay+GstF6yy0fagxylQvNorrGspShZUcjbVaM+lSuojcc8KaUEoN4KaL1PE1SO9NJhlRBIBP2G6OJGOBeqxz7Ul4= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Chao Peng A KVM_MEM_PRIVATE memslot can include both fd-based private memory and hva-based shared memory. Architecture code (like TDX code) can tell whether the on-going fault is private or not. This patch adds a 'is_private' field to kvm_page_fault to indicate this and architecture code is expected to set it. To handle page fault for such memslot, the handling logic is different depending on whether the fault is private or shared. KVM checks if 'is_private' matches the host's view of the page (maintained in mem_attr_array). - For a successful match, private pfn is obtained with restrictedmem_get_page() and shared pfn is obtained with existing get_user_pages(). - For a failed match, KVM causes a KVM_EXIT_MEMORY_FAULT exit to userspace. Userspace then can convert memory between private/shared in host's view and retry the fault. Co-developed-by: Yu Zhang Signed-off-by: Yu Zhang Signed-off-by: Chao Peng Co-developed-by: Sean Christopherson Signed-off-by: Sean Christopherson --- arch/x86/kvm/mmu/mmu.c | 94 +++++++++++++++++++++++++++++++-- arch/x86/kvm/mmu/mmu_internal.h | 1 + 2 files changed, 90 insertions(+), 5 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index a079f36a8bf5..9b48d8d0300b 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -3147,9 +3147,9 @@ static int host_pfn_mapping_level(struct kvm *kvm, gfn_t gfn, return level; } -int kvm_mmu_max_mapping_level(struct kvm *kvm, - const struct kvm_memory_slot *slot, gfn_t gfn, - int max_level) +static int __kvm_mmu_max_mapping_level(struct kvm *kvm, + const struct kvm_memory_slot *slot, + gfn_t gfn, int max_level, bool is_private) { struct kvm_lpage_info *linfo; int host_level; @@ -3161,6 +3161,9 @@ int kvm_mmu_max_mapping_level(struct kvm *kvm, break; } + if (is_private) + return max_level; + if (max_level == PG_LEVEL_4K) return PG_LEVEL_4K; @@ -3168,6 +3171,16 @@ int kvm_mmu_max_mapping_level(struct kvm *kvm, return min(host_level, max_level); } +int kvm_mmu_max_mapping_level(struct kvm *kvm, + const struct kvm_memory_slot *slot, gfn_t gfn, + int max_level) +{ + bool is_private = kvm_slot_can_be_private(slot) && + kvm_mem_is_private(kvm, gfn); + + return __kvm_mmu_max_mapping_level(kvm, slot, gfn, max_level, is_private); +} + void kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) { struct kvm_memory_slot *slot = fault->slot; @@ -3188,8 +3201,9 @@ void kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault * Enforce the iTLB multihit workaround after capturing the requested * level, which will be used to do precise, accurate accounting. */ - fault->req_level = kvm_mmu_max_mapping_level(vcpu->kvm, slot, - fault->gfn, fault->max_level); + fault->req_level = __kvm_mmu_max_mapping_level(vcpu->kvm, slot, + fault->gfn, fault->max_level, + fault->is_private); if (fault->req_level == PG_LEVEL_4K || fault->huge_page_disallowed) return; @@ -4261,6 +4275,55 @@ void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu, struct kvm_async_pf *work) kvm_mmu_do_page_fault(vcpu, work->cr2_or_gpa, 0, true, NULL); } +static inline u8 kvm_max_level_for_order(int order) +{ + BUILD_BUG_ON(KVM_MAX_HUGEPAGE_LEVEL > PG_LEVEL_1G); + + KVM_MMU_WARN_ON(order != KVM_HPAGE_GFN_SHIFT(PG_LEVEL_1G) && + order != KVM_HPAGE_GFN_SHIFT(PG_LEVEL_2M) && + order != KVM_HPAGE_GFN_SHIFT(PG_LEVEL_4K)); + + if (order >= KVM_HPAGE_GFN_SHIFT(PG_LEVEL_1G)) + return PG_LEVEL_1G; + + if (order >= KVM_HPAGE_GFN_SHIFT(PG_LEVEL_2M)) + return PG_LEVEL_2M; + + return PG_LEVEL_4K; +} + +static void kvm_mmu_prepare_memory_fault_exit(struct kvm_vcpu *vcpu, + struct kvm_page_fault *fault) +{ + kvm_prepare_memory_fault_exit(vcpu, fault->gfn << PAGE_SHIFT, + PAGE_SIZE, fault->write, fault->exec, + fault->is_private); +} + +static int kvm_faultin_pfn_private(struct kvm_vcpu *vcpu, + struct kvm_page_fault *fault) +{ + int max_order, r; + + if (!kvm_slot_can_be_private(fault->slot)) { + kvm_mmu_prepare_memory_fault_exit(vcpu, fault); + return -EFAULT; + } + + r = kvm_gmem_get_pfn(vcpu->kvm, fault->slot, fault->gfn, &fault->pfn, + &max_order); + if (r) { + kvm_mmu_prepare_memory_fault_exit(vcpu, fault); + return r; + } + + fault->max_level = min(kvm_max_level_for_order(max_order), + fault->max_level); + fault->map_writable = !(fault->slot->flags & KVM_MEM_READONLY); + + return RET_PF_CONTINUE; +} + static int __kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) { struct kvm_memory_slot *slot = fault->slot; @@ -4293,6 +4356,14 @@ static int __kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault return RET_PF_EMULATE; } + if (fault->is_private != kvm_mem_is_private(vcpu->kvm, fault->gfn)) { + kvm_mmu_prepare_memory_fault_exit(vcpu, fault); + return -EFAULT; + } + + if (fault->is_private) + return kvm_faultin_pfn_private(vcpu, fault); + async = false; fault->pfn = __gfn_to_pfn_memslot(slot, fault->gfn, false, false, &async, fault->write, &fault->map_writable, @@ -7184,6 +7255,19 @@ void kvm_mmu_pre_destroy_vm(struct kvm *kvm) } #ifdef CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES +bool kvm_arch_pre_set_memory_attributes(struct kvm *kvm, + struct kvm_gfn_range *range) +{ + /* + * KVM x86 currently only supports KVM_MEMORY_ATTRIBUTE_PRIVATE, skip + * the slot if the slot will never consume the PRIVATE attribute. + */ + if (!kvm_slot_can_be_private(range->slot)) + return false; + + return kvm_mmu_unmap_gfn_range(kvm, range); +} + static bool hugepage_test_mixed(struct kvm_memory_slot *slot, gfn_t gfn, int level) { diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h index b102014e2c60..4efbf43b4b18 100644 --- a/arch/x86/kvm/mmu/mmu_internal.h +++ b/arch/x86/kvm/mmu/mmu_internal.h @@ -202,6 +202,7 @@ struct kvm_page_fault { /* Derived from mmu and global state. */ const bool is_tdp; + const bool is_private; const bool nx_huge_page_workaround_enabled; /*