From patchwork Thu Aug 1 09:01:15 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Fuad Tabba X-Patchwork-Id: 13749999 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E7224C3DA4A for ; Thu, 1 Aug 2024 09:01:48 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C80BB6B00A1; Thu, 1 Aug 2024 05:01:44 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C2F306B00A2; Thu, 1 Aug 2024 05:01:44 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A85726B00A3; Thu, 1 Aug 2024 05:01:44 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 84DFF6B00A1 for ; Thu, 1 Aug 2024 05:01:44 -0400 (EDT) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 388F080959 for ; Thu, 1 Aug 2024 09:01:44 +0000 (UTC) X-FDA: 82403083728.05.2416EB2 Received: from mail-yb1-f201.google.com (mail-yb1-f201.google.com [209.85.219.201]) by imf12.hostedemail.com (Postfix) with ESMTP id 51D4F40033 for ; Thu, 1 Aug 2024 09:01:41 +0000 (UTC) Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=wbHHEfTG; spf=pass (imf12.hostedemail.com: domain of 39E6rZgUKCHssZaaZfnnfkd.bnlkhmtw-lljuZbj.nqf@flex--tabba.bounces.google.com designates 209.85.219.201 as permitted sender) smtp.mailfrom=39E6rZgUKCHssZaaZfnnfkd.bnlkhmtw-lljuZbj.nqf@flex--tabba.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1722502896; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=hPkdSkcYKbc3+kSJZ/g9JVRtvGOqbhaBzYnzj0kRbFw=; b=1PfGx5csB9sqOPzZ0M0D3k4kvXBmiLKAep3JoJP8p0JcP09Zp8PDIIM64L4l2bfyzhbZSr C3vc+UFx0OJYMNHEswZTxmdMeeudRE/5ROrx65ZYKQgsmOHChe8NiP36Q9e6Ts//yWpBa3 KT/ALBvYNaXEbeL840FaN/7AUkzcCGg= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=wbHHEfTG; spf=pass (imf12.hostedemail.com: domain of 39E6rZgUKCHssZaaZfnnfkd.bnlkhmtw-lljuZbj.nqf@flex--tabba.bounces.google.com designates 209.85.219.201 as permitted sender) smtp.mailfrom=39E6rZgUKCHssZaaZfnnfkd.bnlkhmtw-lljuZbj.nqf@flex--tabba.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1722502896; a=rsa-sha256; cv=none; b=xZ4Ws+0zNWPqnuria7oXVL4BitmvLDFjJJB/9fLqe4YuKjZPrGOKKK/kEsxhXcEvN1scl+ oQTOz3ryex7/1Y9J+8fzsrFZwTPjLs7tuAj2rcb9XwTOGRYUnJI7k2ZgZP4vmPK/ygAwhv 0Mhvf+DuQ/ROpvFoiOfQJRbPXsMhwOA= Received: by mail-yb1-f201.google.com with SMTP id 3f1490d57ef6-e02a4de4f4eso10350710276.1 for ; Thu, 01 Aug 2024 02:01:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1722502900; x=1723107700; darn=kvack.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=hPkdSkcYKbc3+kSJZ/g9JVRtvGOqbhaBzYnzj0kRbFw=; b=wbHHEfTGj2ZnvOJrzOOk5GFWx0dE9YpL/bRXvFDoJf4OI6oTVwwRWZG55CuvO2bDCq vDvklKnMxxCcoKdzD3q9q68u6hfIBjADneZiDS/ocImIm67ighDI/ZGKk1wJ1Xgn/NJP VFtt4SD+Jo9Hp8z4jKjHHopuHMAZFzYKn8fV4rPy7rAso+ra9gWMouXG/f5UiTuAJQ/Z OX9UDnrjL/rj9V8Ua1/AURCmQYSWiIMwVx8N404h6AVb0dhogKN7ZlNY6wEXOAvMWsxg ttq9BUVPcvFLhTAdAa0zyH4EJVa8OuSUG/TlfzvUUGb7u9PTBZYbmSvlbLALePHdi4u5 aRfQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1722502900; x=1723107700; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=hPkdSkcYKbc3+kSJZ/g9JVRtvGOqbhaBzYnzj0kRbFw=; b=BFsreYj6oceu1tnrXPI9+vw6ti0jip/pFixapHx73l31NoFINS5aGFJMdGZgOw51q3 O98rbMXCFcWU/QQobnvcNG5yOUk2vnAjYd67l8d1G82N5nlBaAsaWAcFJY70VfcUK7hn 0fT7Y7Vu33oiECR3Gf0UNTAV5Mbkl5eJSNX3HKRnK5VbDYy+9VmvEewluqAq6lm/Cz6I j+c2+lzNyY+Dgr4Gq/rl78iC32Sz79bl5LtRLFtj8pFpFRMHWPxbDwqfiszKDlFBJ3Ue dl1DZvFKSbx5JTiQF0YmKrbzAMZOFOv9T+NKM81SewVxl5w1l2+YtHbSrcNmO8CDcL/7 7oBQ== X-Forwarded-Encrypted: i=1; AJvYcCVFYj+E2/ZnaGX2gwIGx4RV3zIPlFxbb8BgdQ2sblitUPoXkRCPXle5RsKVNynqp187bKgBdROjfUqYm0X/eFjsA0w= X-Gm-Message-State: AOJu0YxH+i/Op8nCZnEbUFy48vV2K0+kdCCzttwJltSOMhf/TUKwyXcL shoNd5Cj1w7O///cX3jEjs2dHpG+ztsdKP6p5n4RIPtmSUPoGPLMPPKp562tQSLITlBPo25oXw= = X-Google-Smtp-Source: AGHT+IFLHM91ijuDJHLUo5+kzuJ4Oi83XUPKih8V/PqPhJPQ78pJ2Ei/0euF1MyM7UCy8SAtSIgxDnFcYw== X-Received: from fuad.c.googlers.com ([fda3:e722:ac3:cc00:28:9cb1:c0a8:1613]) (user=tabba job=sendgmr) by 2002:a05:6902:2190:b0:e05:f565:6bd3 with SMTP id 3f1490d57ef6-e0bcd490586mr2277276.12.1722502900177; Thu, 01 Aug 2024 02:01:40 -0700 (PDT) Date: Thu, 1 Aug 2024 10:01:15 +0100 In-Reply-To: <20240801090117.3841080-1-tabba@google.com> Mime-Version: 1.0 References: <20240801090117.3841080-1-tabba@google.com> X-Mailer: git-send-email 2.46.0.rc1.232.g9752f9e123-goog Message-ID: <20240801090117.3841080-9-tabba@google.com> Subject: [RFC PATCH v2 08/10] KVM: arm64: Handle guest_memfd()-backed guest page faults From: Fuad Tabba To: kvm@vger.kernel.org, linux-arm-msm@vger.kernel.org, linux-mm@kvack.org Cc: pbonzini@redhat.com, chenhuacai@kernel.org, mpe@ellerman.id.au, anup@brainfault.org, paul.walmsley@sifive.com, palmer@dabbelt.com, aou@eecs.berkeley.edu, seanjc@google.com, viro@zeniv.linux.org.uk, brauner@kernel.org, willy@infradead.org, akpm@linux-foundation.org, xiaoyao.li@intel.com, yilun.xu@intel.com, chao.p.peng@linux.intel.com, jarkko@kernel.org, amoorthy@google.com, dmatlack@google.com, yu.c.zhang@linux.intel.com, isaku.yamahata@intel.com, mic@digikod.net, vbabka@suse.cz, vannapurve@google.com, ackerleytng@google.com, mail@maciej.szmigiero.name, david@redhat.com, michael.roth@amd.com, wei.w.wang@intel.com, liam.merwick@oracle.com, isaku.yamahata@gmail.com, kirill.shutemov@linux.intel.com, suzuki.poulose@arm.com, steven.price@arm.com, quic_eberman@quicinc.com, quic_mnalajal@quicinc.com, quic_tsoni@quicinc.com, quic_svaddagi@quicinc.com, quic_cvanscha@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, catalin.marinas@arm.com, james.morse@arm.com, yuzenghui@huawei.com, oliver.upton@linux.dev, maz@kernel.org, will@kernel.org, qperret@google.com, keirf@google.com, roypat@amazon.co.uk, shuah@kernel.org, hch@infradead.org, jgg@nvidia.com, rientjes@google.com, jhubbard@nvidia.com, fvdl@google.com, hughd@google.com, tabba@google.com X-Rspam-User: X-Stat-Signature: 8uxowa8kciywy8aq4tpgnp7bksas8jpy X-Rspamd-Queue-Id: 51D4F40033 X-Rspamd-Server: rspam11 X-HE-Tag: 1722502901-34197 X-HE-Meta: U2FsdGVkX18EC70AkEmcwHgkYJNsoO5MIEkwKBpX5S2HfSgVsVtNf0QvgzbLcNcBetslHB5bm5hw0cS2Rj1lWkBnshw5o8StZhha+yWMp+lP5aE960TGibCuk83BNS9fcSKK22y9Ue7S0BX+LRkjA2wRG5wwq7NagV9R6vHMOOvVQhPPz/YhFWWe6Yb51B6IS8/FX2/i47cR/Oa1Gq60GHeYrVk14JTpj4bptdZrocU/0wRx3LlQ+KlaqLTTzu8SEduhLKWM/WPsp2I1+SzvP1GvOlv/aijItESV6uM8ANrzlTha1H8BSxwVfCZlYcInqSjTH1t3o6mNRagkNUQqRs9z+tiZVPlUmbxF32KIbPK58dv3nvOPUquZjmdXJ98O5/ScP3fYAtjRutvRsFKXG+2Wm1HYGUNt1WS9PEoL82eSf3LyF701nAPAFItQACldUJkPK3pHaflwbhvxN81DWGQ0L3ZX/t1ShMCLKj+nlcVn5difS5YMuuLqEcGFGHdXc2OpjjkmRexbK7tZf08NkCvVYc/3Dz2Jb6+nbTx3/CJnIWAedk5+vI/1ppIIMZt6Z0vjSp9O3FQs5nd2pz0b+uZSHPmrGaIQ8KH3mDLZvs4CS/KmgG1tCv/4svu6yZ9bAPEEba2SF4h/Uo8ktYzvsBvMQ0x5BxadWYHCKGQ07JJqxAKUac9gaty9dHt4aa9SnKnHBlaWmk7g7lXnslrR1FYiLxaDJohaOUuG6p5b8g3Y6ENJbxr3idiGUtL5DkCBxcDjb+452stnx/KltiR4YoimGExhS6Pfk4jWe4AklH8PrrTKpETOK5uOOlIxDcq6tSoQWfzQm9LeXWgBSEcl0sPzdv152ZoH7DigynJYkyJuuYLyoulzovYp4yu0O99s0wq8dFodX2TWzlOCSToYRRPzDAhlq4M1w0ZdQynijlXsRz3UL/VnnRxJnyrvUy4uCpw49C5/y6rtmqQylFS 1sdFQ96z bHfmR4wmnBUAprlWVMBJLyh1ukLXpWfYO25IqV25fRJIl0TLo4qWBTvBOCNob0oPaZ0IfSxLHe7HHQ/ZWug7ic2HfhaQ/n5j/MJ70E8HX5lnIX2QVucNhEnzz0KzhTEcmWCauSdgSxASS64Icl3IE911Hq+52CBwIEz8o+XTkVd++JLFYcwaVtvK7cpnLLf6sG7nMmWzEWraVr7ii4E8TsmMM/XIUzv+mG5yk7nG8Y8vInMlP7I7QSliZOEysB1YYEYJ9exDzzXB1FMUh8VbwroTwKDfbLCGEZ3Eis+eIeTA96CD1RkpD/yd/1iICy+SuH/gPPDP3wuI/9uVZdaP6THvfreCyh5z90JG2Gy9fI16uXz0N+0Nrk4+a1/e6a+EEVSkrcl+UeAvH+m9VFd8rGQIsh2EHscWqLuKNOrs/NxuMHHmBPc55sPZYceJAumKNict+7ZvKTqUsmYXcm80+JgW/NpmGcjkuuAnGLeqT6oJsK6FwN0S96BTrCLoB8w1W3o+bcv0+haaJTYWGB3Bo4zLUMXvdabAlH5QsKJfK5JBc5gCHFs8l5cPKlMhi5U0Dvh6IgXS8VqZ++/p/cYGU10cd03gWbXQIdh7Ktbc6vBijdKjqbC5oWYd77g== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Add arm64 support for resolving guest page faults on guest_memfd() backed memslots. This support is not contingent on pKVM, or other confidential computing support, and works in both VHE and nVHE modes. Without confidential computing, this support is useful for testing and debugging. In the future, it might also be useful should a user want to use guest_memfd() for all code, whether it's for a protected guest or not. For now, the fault granule is restricted to PAGE_SIZE. Signed-off-by: Fuad Tabba --- arch/arm64/kvm/mmu.c | 127 ++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 125 insertions(+), 2 deletions(-) diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c index b1fc636fb670..e15167865cab 100644 --- a/arch/arm64/kvm/mmu.c +++ b/arch/arm64/kvm/mmu.c @@ -1378,6 +1378,123 @@ static bool kvm_vma_mte_allowed(struct vm_area_struct *vma) return vma->vm_flags & VM_MTE_ALLOWED; } +static int guest_memfd_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, + struct kvm_memory_slot *memslot, bool fault_is_perm) +{ + struct kvm_mmu_memory_cache *memcache = &vcpu->arch.mmu_page_cache; + bool exec_fault = kvm_vcpu_trap_is_exec_fault(vcpu); + bool logging_active = memslot_is_logging(memslot); + struct kvm_pgtable *pgt = vcpu->arch.hw_mmu->pgt; + enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R; + bool write_fault = kvm_is_write_fault(vcpu); + struct mm_struct *mm = current->mm; + gfn_t gfn = gpa_to_gfn(fault_ipa); + struct kvm *kvm = vcpu->kvm; + unsigned long mmu_seq; + struct page *page; + kvm_pfn_t pfn; + int ret; + + /* For now, guest_memfd() only supports PAGE_SIZE granules. */ + if (WARN_ON_ONCE(fault_is_perm && + kvm_vcpu_trap_get_perm_fault_granule(vcpu) != PAGE_SIZE)) { + return -EFAULT; + } + + VM_BUG_ON(write_fault && exec_fault); + + if (fault_is_perm && !write_fault && !exec_fault) { + kvm_err("Unexpected L2 read permission error\n"); + return -EFAULT; + } + + /* + * Permission faults just need to update the existing leaf entry, + * and so normally don't require allocations from the memcache. The + * only exception to this is when dirty logging is enabled at runtime + * and a write fault needs to collapse a block entry into a table. + */ + if (!fault_is_perm || (logging_active && write_fault)) { + ret = kvm_mmu_topup_memory_cache(memcache, + kvm_mmu_cache_min_pages(vcpu->arch.hw_mmu)); + if (ret) + return ret; + } + + /* + * Read mmu_invalidate_seq so that KVM can detect if the results of + * kvm_gmem_get_pfn_locked() become stale prior to acquiring + * kvm->mmu_lock. + */ + mmu_seq = vcpu->kvm->mmu_invalidate_seq; + + /* To pair with the smp_wmb() in kvm_mmu_invalidate_end(). */ + smp_rmb(); + + ret = kvm_gmem_get_pfn_locked(kvm, memslot, gfn, &pfn, NULL); + if (ret) + return ret; + + page = pfn_to_page(pfn); + + if (!kvm_gmem_is_mappable(kvm, gfn, gfn + 1) && + (page_mapped(page) || page_maybe_dma_pinned(page))) { + return -EPERM; + } + + /* + * Once it's faulted in, a guest_memfd() page will stay in memory. + * Therefore, count it as locked. + */ + if (!fault_is_perm) { + ret = account_locked_vm(mm, 1, true); + if (ret) + goto unlock_page; + } + + read_lock(&kvm->mmu_lock); + if (mmu_invalidate_retry(kvm, mmu_seq)) + goto unlock_mmu; + + if (write_fault) + prot |= KVM_PGTABLE_PROT_W; + + if (exec_fault) + prot |= KVM_PGTABLE_PROT_X; + + if (cpus_have_final_cap(ARM64_HAS_CACHE_DIC)) + prot |= KVM_PGTABLE_PROT_X; + + /* + * Under the premise of getting a FSC_PERM fault, we just need to relax + * permissions. + */ + if (fault_is_perm) + ret = kvm_pgtable_stage2_relax_perms(pgt, fault_ipa, prot); + else + ret = kvm_pgtable_stage2_map(pgt, fault_ipa, PAGE_SIZE, + __pfn_to_phys(pfn), prot, + memcache, + KVM_PGTABLE_WALK_HANDLE_FAULT | + KVM_PGTABLE_WALK_SHARED); + + /* Mark the page dirty only if the fault is handled successfully */ + if (write_fault && !ret) { + kvm_set_pfn_dirty(pfn); + mark_page_dirty_in_slot(kvm, memslot, gfn); + } + +unlock_mmu: + read_unlock(&kvm->mmu_lock); + + if (ret && !fault_is_perm) + account_locked_vm(mm, 1, false); +unlock_page: + unlock_page(page); + put_page(page); + return ret != -EAGAIN ? ret : 0; +} + static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, struct kvm_memory_slot *memslot, unsigned long hva, bool fault_is_perm) @@ -1748,8 +1865,14 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu) goto out_unlock; } - ret = user_mem_abort(vcpu, fault_ipa, memslot, hva, - esr_fsc_is_permission_fault(esr)); + if (kvm_slot_can_be_private(memslot)) { + ret = guest_memfd_abort(vcpu, fault_ipa, memslot, + esr_fsc_is_permission_fault(esr)); + } else { + ret = user_mem_abort(vcpu, fault_ipa, memslot, hva, + esr_fsc_is_permission_fault(esr)); + } + if (ret == 0) ret = 1; out: