From patchwork Tue Mar 18 16:20:44 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Fuad Tabba X-Patchwork-Id: 14021249 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DDB6DC282EC for ; Tue, 18 Mar 2025 16:21:06 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id F2B9D280015; Tue, 18 Mar 2025 12:21:00 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E175428000B; Tue, 18 Mar 2025 12:21:00 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BCE32280015; Tue, 18 Mar 2025 12:21:00 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 9174C28000B for ; Tue, 18 Mar 2025 12:21:00 -0400 (EDT) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id CAF29586F0 for ; Tue, 18 Mar 2025 16:21:01 +0000 (UTC) X-FDA: 83235185922.02.6B68086 Received: from mail-wm1-f74.google.com (mail-wm1-f74.google.com [209.85.128.74]) by imf08.hostedemail.com (Postfix) with ESMTP id 66883160027 for ; Tue, 18 Mar 2025 16:20:59 +0000 (UTC) Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=bGAbca+c; spf=pass (imf08.hostedemail.com: domain of 3aZ3ZZwUKCFABsttsy66y3w.u64305CF-442Dsu2.69y@flex--tabba.bounces.google.com designates 209.85.128.74 as permitted sender) smtp.mailfrom=3aZ3ZZwUKCFABsttsy66y3w.u64305CF-442Dsu2.69y@flex--tabba.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1742314859; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=a6WFCPXhVulU3JlKKdNPQ8TZldM3kvZLBiH+wPK8x24=; b=mYW6LTTglA4KNo73jMpGpA43D1ybU9vWSA6QUP826Dz24SUG8XfmBzeQbhuvmgXYyGgEQB u21MuEnG0CAhwD7RVKHLVrmvoe3oMtQciMm3TTu1GTSP+Tmn6kpn+fEGDL2nK0p13t/KBI bo4eiWr6ronYSPo4NxTzmN9JSWnzI50= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1742314859; a=rsa-sha256; cv=none; b=xE7Mzrp6zIfc7XiH9Z3Hu75nY6kGDvB3hRpw9Tfl1f9kXaIUfxnO58TwZTvKq9MKHzbAz7 CLJgpDBaHPZpG49ECH6keB6J4X6fwhe9UZZG5kx1ip4DWjAvmZaa22iEact17WTVBYOa1Z 5Ino4OroZ7lMO3j2sJ4riaLPPWysB4Q= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=bGAbca+c; spf=pass (imf08.hostedemail.com: domain of 3aZ3ZZwUKCFABsttsy66y3w.u64305CF-442Dsu2.69y@flex--tabba.bounces.google.com designates 209.85.128.74 as permitted sender) smtp.mailfrom=3aZ3ZZwUKCFABsttsy66y3w.u64305CF-442Dsu2.69y@flex--tabba.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-wm1-f74.google.com with SMTP id 5b1f17b1804b1-43ce245c5acso39934955e9.2 for ; Tue, 18 Mar 2025 09:20:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1742314858; x=1742919658; darn=kvack.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=a6WFCPXhVulU3JlKKdNPQ8TZldM3kvZLBiH+wPK8x24=; b=bGAbca+cRpgRzhxqEwegVEkGPvhWJBt6nZhw05wiiMNADAYy6rxH7zFrHZo6GdxTlD Ty58t5/NHmuolduYaHMZB6PtLTtn2oHt11+AW8oorMuhTeQuBT9WRg8Ino0oJoEPyGra oMU8SHk7IWmKWEyxlaN5ADB3C+KQJggqCGCu3WAYN+PkORq7iuridl0MdOns3afuxwbu j7o98DEEE12NoDFDftYaGQ6BwLXD971Cw59vTbidJXN+dYYeUDF37xPMHC7LKBA5xQcF //3bTqNDwcXtFy+HZPxNIdjgf4e+X6kZKKz5mBfJIImPlEAfu3ciS03TJRPz4XNGEJr9 8tOg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1742314858; x=1742919658; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=a6WFCPXhVulU3JlKKdNPQ8TZldM3kvZLBiH+wPK8x24=; b=jIsmiJaHnS8ejk3y3vB10WdMQJgsEyBlIQb+DXROYB27Hz4q9Rpz4NLCx1+AygZO99 8wuN9y0bMHo6t0NyX2SKqirpArKhBzchNSlV7Qfkx87IKPoxl9ta2a+tF9E2M3m3DN9b NKoHnvDk9pbS7UEbBUXV/+S9ZzpbUdKWPYLfg63qHPKORcohza/h3swUAhWs00IZs8Hd ydsgDG8qKvVZeUG9D5qwsZ2ONZgmV/WEtM7miGbe7rq5IVDOfwxDT5RHJKyfTZBBcEMv p3CT0tzWWfSjrJfZHpsCa53TNLBwTykKEWXG4X3rI5jMuo7DWwTBionm5+E2GcosjP4w MVaQ== X-Forwarded-Encrypted: i=1; AJvYcCXEqlj4sKo3OwJCFTQ15s3e94i5lXO4ApjzDntXzFoOZ1uL0k6BhT7lPaJGbFG0a/JxYXDW/L6I9Q==@kvack.org X-Gm-Message-State: AOJu0YzZbSjqqPNym2/tzog59l9PCN+uTUGwaYEMB+ntfswIeb8bv0Ln srYhW20NhDdBvxpZtjujM7wMDKA4ikZM2psyWZLhLJzJ8IF03O0ObKidqDvS7EhzTfbhJBQVFA= = X-Google-Smtp-Source: AGHT+IFpCY8S6zPBB8YKDIAA3pydzkUV4b8FblJAz6faFFJzNyDdarVqLGdmgRcnMjMcQ62t91jU7QaUhQ== X-Received: from wmrs14.prod.google.com ([2002:a05:600c:384e:b0:43c:f8ae:4d6c]) (user=tabba job=prod-delivery.src-stubby-dispatcher) by 2002:a05:600c:4e0b:b0:43c:fbbf:7bf1 with SMTP id 5b1f17b1804b1-43d3ba0f389mr35016465e9.30.1742314857836; Tue, 18 Mar 2025 09:20:57 -0700 (PDT) Date: Tue, 18 Mar 2025 16:20:44 +0000 In-Reply-To: <20250318162046.4016367-1-tabba@google.com> Mime-Version: 1.0 References: <20250318162046.4016367-1-tabba@google.com> X-Mailer: git-send-email 2.49.0.rc1.451.g8f38331e32-goog Message-ID: <20250318162046.4016367-6-tabba@google.com> Subject: [PATCH v6 5/7] KVM: guest_memfd: Restore folio state after final folio_put() From: Fuad Tabba To: kvm@vger.kernel.org, linux-arm-msm@vger.kernel.org, linux-mm@kvack.org Cc: pbonzini@redhat.com, chenhuacai@kernel.org, mpe@ellerman.id.au, anup@brainfault.org, paul.walmsley@sifive.com, palmer@dabbelt.com, aou@eecs.berkeley.edu, seanjc@google.com, viro@zeniv.linux.org.uk, brauner@kernel.org, willy@infradead.org, akpm@linux-foundation.org, xiaoyao.li@intel.com, yilun.xu@intel.com, chao.p.peng@linux.intel.com, jarkko@kernel.org, amoorthy@google.com, dmatlack@google.com, isaku.yamahata@intel.com, mic@digikod.net, vbabka@suse.cz, vannapurve@google.com, ackerleytng@google.com, mail@maciej.szmigiero.name, david@redhat.com, michael.roth@amd.com, wei.w.wang@intel.com, liam.merwick@oracle.com, isaku.yamahata@gmail.com, kirill.shutemov@linux.intel.com, suzuki.poulose@arm.com, steven.price@arm.com, quic_eberman@quicinc.com, quic_mnalajal@quicinc.com, quic_tsoni@quicinc.com, quic_svaddagi@quicinc.com, quic_cvanscha@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, catalin.marinas@arm.com, james.morse@arm.com, yuzenghui@huawei.com, oliver.upton@linux.dev, maz@kernel.org, will@kernel.org, qperret@google.com, keirf@google.com, roypat@amazon.co.uk, shuah@kernel.org, hch@infradead.org, jgg@nvidia.com, rientjes@google.com, jhubbard@nvidia.com, fvdl@google.com, hughd@google.com, jthoughton@google.com, peterx@redhat.com, tabba@google.com X-Rspam-User: X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 66883160027 X-Stat-Signature: zey4csi8s4d5x77njqgrd141961z6gjp X-HE-Tag: 1742314859-184173 X-HE-Meta: U2FsdGVkX1+VLjzjvunqiG4WqPQpusH4s6L4iQW31auaJ5zGA1h1fk/Q+kDjxGtf8WOzyTg8woeLzcKdIlNBL2TQqd50f3jJuPiLkj33TI7XCnmue+JGVZ+sqj6LXV/jmXiL7inOXqmf1EXCeBSCNBNbPZ6AhWA1LDQR/L5rLN0XyhoTpMD6VAyTJ6hJghV1H67HcH+OevDy1UyAFBHG7OZgxmNUvRxcDcCiInzyUa3hg5CaaXm2BCCTYSfRRmopht3xCc6cldbjl6aNkrJpD12sX1p8oVJl9zrfBbG64kAvDw2KrHF7G4qQwt7yxIv+d7NUHvQwgmeQhyMaFKI3TZ1ZKXoP2QLTNvF4DMfNF0KaUtwSFamO0+skKi3sx42nDOyYsScuLqr4wwwQLxEYi6M3YhC91TzGPdDNlJzn+R9pPbV37liUNpi/IozNq3ZSjIpFOizeJ4sX8TotFsj2sGBI2tejX5ZTlHPS6JBGNZKJvVBwowQ3PiVdieHYxBuH3t3lhxUD2DCjPMJCmObcGyEm+wFIKHsf/mSc03j+1XnGBDCu2HjLtFdD7zH+Q8sFM5rScrCtuh1RK1GOWyLNvCnSLkIeTwbDCBFBU+zorqva9YoEIW1hYCQ7iRQ8pOUfpmifgITuweiO2JCDCyzZICpY4wIp0iIn6C+rZYKBjQQ4zmfYkvgbWDdBF0q4laEvA+dPocJ81CTaTX5NAxCsvIePeu+GT+xnVQZlpBVywcWzir6BV73OyeLyH74nGbRbkrPV2+UoJGLn8ZOxkN2e4xsI92PUJrGLaWBrA3BTmUXDZEC7UOkyrmYMY6Ib2IMUUuiS+rfsnA6bAtOADCQJ4x1SP9chBaWP/2Ybw+ickGFyp0GVLRPYEZh/3OUPTX7EUJct3sjQLAa3H7/NpFPHNfcFurpr6hfECmerW9PvvGn+6x8FzWIbsjLAYUdlhnnITMLwiMw7ySrGx9JP3T7 2PYe2/Jq v1pxEWYYbMtH14SexwXKTEzlQU5UwiWPppHLFVnD6EcGCJs6/7GfMSINd7obSVl9sx9nhOpfdj6xrER1icnj9sGvSxfbXevhtzdW0L5XqXX0rIlL8YuZaW5kDmsTr644uA/08j1QJfv+K4vKiLNPXklvX4FlPncCSJPFLF7cp/qhUtPIqYO7UGokeSHmFZptNvWK/qYRYYVkFYEemBfF4XSYqpYAIznV5XK0MAtxMkAvQEYdy5/IOVGQiI9yNkr/EjorEAG7Q+vf37twvJEpHj/T3jEpJn4UEz0TjN+lTaWNAIOXuxoWkeBTTNyQQLwM5MyYGXOi+eDaZIYAx2eysqmB6Q58nqT6pWRwtKNwaDsqppTU4gI241cKgs2/f+lJmLkCE6RJpvjdM+EMIqHqLxhwaBzP2tHOmiywshXnjSevCbZGWxXN/9vAZ+dLPOgd+ESe2/Q3lnI6w9U3LHd4mnQnjsH0jGhiJlkuO7j2tGnhdlkEN3ECxLrWcx+Q1AoYI2T8pBHJoUUjCvnJ8r26sGGZoXNUDV+SYUAvfdWOg3f2z1RYNjgy/7JlJP3RF69zz9bNBuYKmpTxwgycFosUQWUs4hldHXz9RYaWadJJc2u7QVamIPttNKG4CmGFooN9cfZA0Tlbi6ZqDH1VjcytS5KaOTh6XWB0C+lBYQb+D2F6/1YnzuPisTb75D1ISRnPBTprClyoHPk/yRRk= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Before transitioning a guest_memfd folio to unshared, thereby disallowing access by the host and allowing the hypervisor to transition its view of the guest page as private, we need to be sure that the host doesn't have any references to the folio. This patch uses the guest_memfd folio type to register a callback that informs the guest_memfd subsystem when the last reference is dropped, therefore knowing that the host doesn't have any remaining references. Signed-off-by: Fuad Tabba --- The function kvm_slot_gmem_register_callback() isn't used in this series. It will be used later in code that performs unsharing of memory. I have tested it with pKVM, based on downstream code [*]. It's included in this RFC since it demonstrates the plan to handle unsharing of private folios. [*] https://android-kvm.googlesource.com/linux/+/refs/heads/tabba/guestmem-6.13-v6-pkvm --- include/linux/kvm_host.h | 6 ++ virt/kvm/guest_memfd.c | 142 ++++++++++++++++++++++++++++++++++++++- 2 files changed, 147 insertions(+), 1 deletion(-) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index bf82faf16c53..d9d9d72d8beb 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -2607,6 +2607,7 @@ int kvm_gmem_slot_set_shared(struct kvm_memory_slot *slot, gfn_t start, int kvm_gmem_slot_clear_shared(struct kvm_memory_slot *slot, gfn_t start, gfn_t end); bool kvm_gmem_slot_is_guest_shared(struct kvm_memory_slot *slot, gfn_t gfn); +int kvm_gmem_slot_register_callback(struct kvm_memory_slot *slot, gfn_t gfn); void kvm_gmem_handle_folio_put(struct folio *folio); #else static inline int kvm_gmem_set_shared(struct kvm *kvm, gfn_t start, gfn_t end) @@ -2638,6 +2639,11 @@ static inline bool kvm_gmem_slot_is_guest_shared(struct kvm_memory_slot *slot, WARN_ON_ONCE(1); return false; } +static inline int kvm_gmem_slot_register_callback(struct kvm_memory_slot *slot, gfn_t gfn) +{ + WARN_ON_ONCE(1); + return -EINVAL; +} #endif /* CONFIG_KVM_GMEM_SHARED_MEM */ #endif diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c index 4b857ab421bf..4fd9e5760503 100644 --- a/virt/kvm/guest_memfd.c +++ b/virt/kvm/guest_memfd.c @@ -391,6 +391,28 @@ enum folio_shareability { KVM_GMEM_NONE_SHARED = 0b11, /* Not shared, transient state. */ }; +/* + * Unregisters the __folio_put() callback from the folio. + * + * Restores a folio's refcount after all pending references have been released, + * and removes the folio type, thereby removing the callback. Now the folio can + * be freed normaly once all actual references have been dropped. + * + * Must be called with the filemap (inode->i_mapping) invalidate_lock held, and + * the folio must be locked. + */ +static void kvm_gmem_restore_pending_folio(struct folio *folio, const struct inode *inode) +{ + rwsem_assert_held_write_nolockdep(&inode->i_mapping->invalidate_lock); + WARN_ON_ONCE(!folio_test_locked(folio)); + + if (WARN_ON_ONCE(folio_mapped(folio) || !folio_test_guestmem(folio))) + return; + + __folio_clear_guestmem(folio); + folio_ref_add(folio, folio_nr_pages(folio)); +} + static int kvm_gmem_offset_set_shared(struct inode *inode, pgoff_t index) { struct xarray *shared_offsets = &kvm_gmem_private(inode)->shared_offsets; @@ -398,6 +420,24 @@ static int kvm_gmem_offset_set_shared(struct inode *inode, pgoff_t index) rwsem_assert_held_write_nolockdep(&inode->i_mapping->invalidate_lock); + /* + * If the folio is NONE_SHARED, it indicates that it is transitioning to + * private (GUEST_SHARED). Transition it to shared (ALL_SHARED) + * immediately, and remove the callback. + */ + if (xa_to_value(xa_load(shared_offsets, index)) == KVM_GMEM_NONE_SHARED) { + struct folio *folio = filemap_lock_folio(inode->i_mapping, index); + + if (WARN_ON_ONCE(IS_ERR(folio))) + return PTR_ERR(folio); + + if (folio_test_guestmem(folio)) + kvm_gmem_restore_pending_folio(folio, inode); + + folio_unlock(folio); + folio_put(folio); + } + return xa_err(xa_store(shared_offsets, index, xval, GFP_KERNEL)); } @@ -498,9 +538,109 @@ static int kvm_gmem_offset_range_clear_shared(struct inode *inode, return r; } +/* + * Registers a callback to __folio_put(), so that gmem knows that the host does + * not have any references to the folio. The callback itself is registered by + * setting the folio type to guestmem. + * + * Returns 0 if a callback was registered or already has been registered, or + * -EAGAIN if the host has references, indicating a callback wasn't registered. + * + * Must be called with the filemap (inode->i_mapping) invalidate_lock held, and + * the folio must be locked. + */ +static int kvm_gmem_register_callback(struct folio *folio, struct inode *inode, pgoff_t index) +{ + struct xarray *shared_offsets = &kvm_gmem_private(inode)->shared_offsets; + void *xval_guest = xa_mk_value(KVM_GMEM_GUEST_SHARED); + int refcount; + int r = 0; + + rwsem_assert_held_write_nolockdep(&inode->i_mapping->invalidate_lock); + WARN_ON_ONCE(!folio_test_locked(folio)); + + if (folio_test_guestmem(folio)) + return 0; + + if (folio_mapped(folio)) + return -EAGAIN; + + refcount = folio_ref_count(folio); + if (!folio_ref_freeze(folio, refcount)) + return -EAGAIN; + + /* + * Register callback by setting the folio type and subtracting gmem's + * references for it to trigger once outstanding references are dropped. + */ + if (refcount > 1) { + __folio_set_guestmem(folio); + refcount -= folio_nr_pages(folio); + } else { + /* No outstanding references, transition it to guest shared. */ + r = WARN_ON_ONCE(xa_err(xa_store(shared_offsets, index, xval_guest, GFP_KERNEL))); + } + + folio_ref_unfreeze(folio, refcount); + return r; +} + +int kvm_gmem_slot_register_callback(struct kvm_memory_slot *slot, gfn_t gfn) +{ + unsigned long pgoff = slot->gmem.pgoff + gfn - slot->base_gfn; + struct inode *inode = file_inode(READ_ONCE(slot->gmem.file)); + struct folio *folio; + int r; + + filemap_invalidate_lock(inode->i_mapping); + + folio = filemap_lock_folio(inode->i_mapping, pgoff); + if (WARN_ON_ONCE(IS_ERR(folio))) { + r = PTR_ERR(folio); + goto out; + } + + r = kvm_gmem_register_callback(folio, inode, pgoff); + + folio_unlock(folio); + folio_put(folio); +out: + filemap_invalidate_unlock(inode->i_mapping); + + return r; +} +EXPORT_SYMBOL_GPL(kvm_gmem_slot_register_callback); + +/* + * Callback function for __folio_put(), i.e., called once all references by the + * host to the folio have been dropped. This allows gmem to transition the state + * of the folio to shared with the guest, and allows the hypervisor to continue + * transitioning its state to private, since the host cannot attempt to access + * it anymore. + */ void kvm_gmem_handle_folio_put(struct folio *folio) { - WARN_ONCE(1, "A placeholder that shouldn't trigger. Work in progress."); + struct address_space *mapping; + struct xarray *shared_offsets; + struct inode *inode; + pgoff_t index; + void *xval; + + mapping = folio->mapping; + if (WARN_ON_ONCE(!mapping)) + return; + + inode = mapping->host; + index = folio->index; + shared_offsets = &kvm_gmem_private(inode)->shared_offsets; + xval = xa_mk_value(KVM_GMEM_GUEST_SHARED); + + filemap_invalidate_lock(inode->i_mapping); + folio_lock(folio); + kvm_gmem_restore_pending_folio(folio, inode); + folio_unlock(folio); + WARN_ON_ONCE(xa_err(xa_store(shared_offsets, index, xval, GFP_KERNEL))); + filemap_invalidate_unlock(inode->i_mapping); } EXPORT_SYMBOL_GPL(kvm_gmem_handle_folio_put);