[RFC,v2,12/22] iommufd: Allow mapping from guest_memfd

Message ID	20250218111017.491719-13-aik@amd.com (mailing list archive)
State	RFC
Delegated to:	Bjorn Helgaas
Headers	show Received: from NAM11-DM6-obe.outbound.protection.outlook.com (mail-dm6nam11on2079.outbound.protection.outlook.com [40.107.223.79]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2A18B1AF0B7; Tue, 18 Feb 2025 11:14:45 +0000 (UTC) Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=SATLEXMB04.amd.com; pr=C From: Alexey Kardashevskiy <aik@amd.com> To: <x86@kernel.org> CC: <kvm@vger.kernel.org>, <linux-crypto@vger.kernel.org>, <linux-pci@vger.kernel.org>, <linux-arch@vger.kernel.org>, "Sean Christopherson" <seanjc@google.com>, Paolo Bonzini <pbonzini@redhat.com>, "Tom Lendacky" <thomas.lendacky@amd.com>, Ashish Kalra <ashish.kalra@amd.com>, Joerg Roedel <joro@8bytes.org>, Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>, Robin Murphy <robin.murphy@arm.com>, "Jason Gunthorpe" <jgg@ziepe.ca>, Kevin Tian <kevin.tian@intel.com>, Bjorn Helgaas <bhelgaas@google.com>, Dan Williams <dan.j.williams@intel.com>, "Christoph Hellwig" <hch@lst.de>, Nikunj A Dadhania <nikunj@amd.com>, Michael Roth <michael.roth@amd.com>, Vasant Hegde <vasant.hegde@amd.com>, Joao Martins <joao.m.martins@oracle.com>, Nicolin Chen <nicolinc@nvidia.com>, Lu Baolu <baolu.lu@linux.intel.com>, Steve Sistare <steven.sistare@oracle.com>, "Lukas Wunner" <lukas@wunner.de>, Jonathan Cameron <Jonathan.Cameron@huawei.com>, Suzuki K Poulose <suzuki.poulose@arm.com>, Dionna Glaze <dionnaglaze@google.com>, Yi Liu <yi.l.liu@intel.com>, <iommu@lists.linux.dev>, <linux-coco@lists.linux.dev>, Zhi Wang <zhiw@nvidia.com>, AXu Yilun <yilun.xu@linux.intel.com>, "Aneesh Kumar K . V" <aneesh.kumar@kernel.org>, Alexey Kardashevskiy <aik@amd.com> Subject: [RFC PATCH v2 12/22] iommufd: Allow mapping from guest_memfd Date: Tue, 18 Feb 2025 22:09:59 +1100 Message-ID: <20250218111017.491719-13-aik@amd.com> In-Reply-To: <20250218111017.491719-1-aik@amd.com> References: <20250218111017.491719-1-aik@amd.com> Precedence: bulk MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain
Series	TSM: Secure VFIO, TDISP, SEV TIO \| expand [RFC,v2,00/22] TSM: Secure VFIO, TDISP, SEV TIO [RFC,v2,01/22] pci/doe: Define protocol types and make those public [RFC,v2,02/22] PCI/IDE: Fixes to make it work on AMD SNP-SEV [RFC,v2,03/22] PCI/IDE: Init IDs on all IDE streams beforehand [RFC,v2,04/22] iommu/amd: Report SEV-TIO support [RFC,v2,05/22] crypto: ccp: Enable SEV-TIO feature in the PSP when supported [RFC,v2,06/22] KVM: X86: Define tsm_get_vmid [RFC,v2,07/22] coco/tsm: Add tsm and tsm-host modules [RFC,v2,08/22] pci/tsm: Add PCI driver for TSM [RFC,v2,09/22] crypto/ccp: Implement SEV TIO firmware interface [RFC,v2,10/22] KVM: SVM: Add uAPI to change RMP for MMIO [RFC,v2,11/22] KVM: SEV: Add TIO VMGEXIT [RFC,v2,12/22] iommufd: Allow mapping from guest_memfd [RFC,v2,13/22] iommufd: amd-iommu: Add vdevice support [RFC,v2,14/22] iommufd: Add TIO calls [RFC,v2,15/22] KVM: X86: Handle private MMIO as shared [RFC,v2,16/22] coco/tsm: Add tsm-guest module [RFC,v2,17/22] resource: Mark encrypted MMIO resource on validation [RFC,v2,18/22] coco/sev-guest: Implement the guest support for SEV TIO [RFC,v2,19/22] RFC: pci: Add BUS_NOTIFY_PCI_BUS_MASTER event [RFC,v2,20/22] sev-guest: Stop changing encrypted page state for TDISP devices [RFC,v2,21/22] pci: Allow encrypted MMIO mapping via sysfs [RFC,v2,22/22] pci: Define pci_iomap_range_encrypted

Message ID

20250218111017.491719-13-aik@amd.com (mailing list archive)

State

RFC

Delegated to:

Bjorn Helgaas

Headers

Received-SPF: Pass (protection.outlook.com: domain of amd.com designates
 165.204.84.17 as permitted sender) receiver=protection.outlook.com;
 client-ip=165.204.84.17; helo=SATLEXMB04.amd.com; pr=C
From: Alexey Kardashevskiy <aik@amd.com>
To: <x86@kernel.org>
CC: <kvm@vger.kernel.org>, <linux-crypto@vger.kernel.org>,
	<linux-pci@vger.kernel.org>, <linux-arch@vger.kernel.org>, "Sean
 Christopherson" <seanjc@google.com>, Paolo Bonzini <pbonzini@redhat.com>,
	"Tom Lendacky" <thomas.lendacky@amd.com>, Ashish Kalra
	<ashish.kalra@amd.com>, Joerg Roedel <joro@8bytes.org>, Suravee Suthikulpanit
	<suravee.suthikulpanit@amd.com>, Robin Murphy <robin.murphy@arm.com>, "Jason
 Gunthorpe" <jgg@ziepe.ca>, Kevin Tian <kevin.tian@intel.com>, Bjorn Helgaas
	<bhelgaas@google.com>, Dan Williams <dan.j.williams@intel.com>, "Christoph
 Hellwig" <hch@lst.de>, Nikunj A Dadhania <nikunj@amd.com>, Michael Roth
	<michael.roth@amd.com>, Vasant Hegde <vasant.hegde@amd.com>, Joao Martins
	<joao.m.martins@oracle.com>, Nicolin Chen <nicolinc@nvidia.com>, Lu Baolu
	<baolu.lu@linux.intel.com>, Steve Sistare <steven.sistare@oracle.com>, "Lukas
 Wunner" <lukas@wunner.de>, Jonathan Cameron <Jonathan.Cameron@huawei.com>,
	Suzuki K Poulose <suzuki.poulose@arm.com>, Dionna Glaze
	<dionnaglaze@google.com>, Yi Liu <yi.l.liu@intel.com>,
	<iommu@lists.linux.dev>, <linux-coco@lists.linux.dev>, Zhi Wang
	<zhiw@nvidia.com>, AXu Yilun <yilun.xu@linux.intel.com>, "Aneesh Kumar K . V"
	<aneesh.kumar@kernel.org>, Alexey Kardashevskiy <aik@amd.com>
Subject: [RFC PATCH v2 12/22] iommufd: Allow mapping from guest_memfd
Date: Tue, 18 Feb 2025 22:09:59 +1100
Message-ID: <20250218111017.491719-13-aik@amd.com>
In-Reply-To: <20250218111017.491719-1-aik@amd.com>
References: <20250218111017.491719-1-aik@amd.com>
Precedence: bulk
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Content-Type: text/plain
X-MS-Exchange-CrossTenant-OriginalArrivalTime: 18 Feb 2025 11:14:43.0724
 (UTC)
X-MS-Exchange-CrossTenant-Network-Message-Id: 
 25c3facf-837b-4da8-0a74-08dd500d71d7
X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d
X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: 
 TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d;Ip=[165.204.84.17];Helo=[SATLEXMB04.amd.com]
X-MS-Exchange-CrossTenant-AuthSource: 
	BN3PEPF0000B074.namprd04.prod.outlook.com
X-MS-Exchange-CrossTenant-AuthAs: Anonymous
X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem
X-MS-Exchange-Transport-CrossTenantHeadersStamped: CY5PR12MB6202

Series

TSM: Secure VFIO, TDISP, SEV TIO | expand

Commit Message

Alexey Kardashevskiy Feb. 18, 2025, 11:09 a.m. UTC

CoCo VMs get their private memory allocated from guest_memfd
("gmemfd") which is a KVM facility similar to memfd.
At the moment gmemfds cannot mmap() so the usual GUP API does
not work on these as expected.

Use the existing IOMMU_IOAS_MAP_FILE API to allow mapping from
fd + offset. Detect the gmemfd case in pfn_reader_user_pin() and
simplified mapping.

The long term plan is to ditch this workaround and follow
the usual memfd path.

Signed-off-by: Alexey Kardashevskiy <aik@amd.com>
---
 drivers/iommu/iommufd/pages.c | 88 +++++++++++++++++++-
 1 file changed, 87 insertions(+), 1 deletion(-)

Comments

Jason Gunthorpe Feb. 18, 2025, 2:16 p.m. UTC | #1

On Tue, Feb 18, 2025 at 10:09:59PM +1100, Alexey Kardashevskiy wrote:
> CoCo VMs get their private memory allocated from guest_memfd
> ("gmemfd") which is a KVM facility similar to memfd.
> At the moment gmemfds cannot mmap() so the usual GUP API does
> not work on these as expected.
> 
> Use the existing IOMMU_IOAS_MAP_FILE API to allow mapping from
> fd + offset. Detect the gmemfd case in pfn_reader_user_pin() and
> simplified mapping.
> 
> The long term plan is to ditch this workaround and follow
> the usual memfd path.

How is that possible though?

> +static struct folio *guest_memfd_get_pfn(struct file *file, unsigned long index,
> +					 unsigned long *pfn, int *max_order)
> +{
> +	struct folio *folio;
> +	int ret = 0;
> +
> +	folio = filemap_grab_folio(file_inode(file)->i_mapping, index);
> +
> +	if (IS_ERR(folio))
> +		return folio;
> +
> +	if (folio_test_hwpoison(folio)) {
> +		folio_unlock(folio);
> +		folio_put(folio);
> +		return ERR_PTR(-EHWPOISON);
> +	}
> +
> +	*pfn = folio_pfn(folio) + (index & (folio_nr_pages(folio) - 1));
> +	if (!max_order)
> +		goto unlock_exit;
> +
> +	/* Refs for unpin_user_page_range_dirty_lock->gup_put_folio(FOLL_PIN) */
> +	ret = folio_add_pins(folio, 1);
> +	folio_put(folio); /* Drop ref from filemap_grab_folio */
> +
> +unlock_exit:
> +	folio_unlock(folio);
> +	if (ret)
> +		folio = ERR_PTR(ret);
> +
> +	return folio;
> +}

Connecting iommufd to guestmemfd through the FD is broadly the right
idea, but I'm not sure this matches the design of guestmemfd regarding
pinnability. IIRC they were adamant that the pages would not be
pinned..

folio_add_pins() just prevents the folio from being freed, it doesn't
prevent the guestmemfd code from messing with the filemap.

You should separate this from the rest of the series and discuss it
directly with the guestmemfd maintainers.

As I understood it the requirement here is to have some kind of
invalidation callback so that iommufd can drop mappings, but I don't
really know and AFAIK AMD is special in wanting private pages mapped
to the hypervisor iommu..

Jason

diff --git a/drivers/iommu/iommufd/pages.c b/drivers/iommu/iommufd/pages.c
index 3427749bc5ce..457d8eaacd2c 100644
--- a/drivers/iommu/iommufd/pages.c
+++ b/drivers/iommu/iommufd/pages.c
@@ -53,6 +53,7 @@ 
 #include <linux/overflow.h>
 #include <linux/slab.h>
 #include <linux/sched/mm.h>
+#include <linux/pagemap.h>
 
 #include "double_span.h"
 #include "io_pagetable.h"
@@ -850,6 +851,88 @@  static long pin_memfd_pages(struct pfn_reader_user *user, unsigned long start,
 	return npages_out;
 }
 
+static bool is_guest_memfd(struct file *file)
+{
+	struct address_space *mapping = file_inode(file)->i_mapping;
+
+	return mapping_inaccessible(mapping) && mapping_unevictable(mapping);
+}
+
+static struct folio *guest_memfd_get_pfn(struct file *file, unsigned long index,
+					 unsigned long *pfn, int *max_order)
+{
+	struct folio *folio;
+	int ret = 0;
+
+	folio = filemap_grab_folio(file_inode(file)->i_mapping, index);
+
+	if (IS_ERR(folio))
+		return folio;
+
+	if (folio_test_hwpoison(folio)) {
+		folio_unlock(folio);
+		folio_put(folio);
+		return ERR_PTR(-EHWPOISON);
+	}
+
+	*pfn = folio_pfn(folio) + (index & (folio_nr_pages(folio) - 1));
+	if (!max_order)
+		goto unlock_exit;
+
+	/* Refs for unpin_user_page_range_dirty_lock->gup_put_folio(FOLL_PIN) */
+	ret = folio_add_pins(folio, 1);
+	folio_put(folio); /* Drop ref from filemap_grab_folio */
+
+unlock_exit:
+	folio_unlock(folio);
+	if (ret)
+		folio = ERR_PTR(ret);
+
+	return folio;
+}
+
+static long pin_guest_memfd_pages(struct pfn_reader_user *user, loff_t start, unsigned long npages,
+			       struct iopt_pages *pages)
+{
+	unsigned long offset = 0;
+	loff_t uptr = start;
+	long rc = 0;
+
+	for (unsigned long i = 0; i < npages; ++i, uptr += PAGE_SIZE) {
+		unsigned long gfn = 0, pfn = 0;
+		int max_order = 0;
+		struct folio *folio;
+
+		folio = guest_memfd_get_pfn(user->file, uptr >> PAGE_SHIFT, &pfn, &max_order);
+		if (IS_ERR(folio))
+			rc = PTR_ERR(folio);
+
+		if (rc == -EINVAL && i == 0) {
+			pr_err_once("Must be vfio mmio at gfn=%lx pfn=%lx, skipping\n", gfn, pfn);
+			return rc;
+		}
+
+		if (rc) {
+			pr_err("%s: %ld %ld %lx -> %lx\n", __func__,
+			       rc, i, (unsigned long) uptr, (unsigned long) pfn);
+			break;
+		}
+
+		if (i == 0)
+			offset = offset_in_folio(folio, start);
+
+		user->ufolios[i] = folio;
+	}
+
+	if (!rc) {
+		rc = npages;
+		user->ufolios_next = user->ufolios;
+		user->ufolios_offset = offset;
+	}
+
+	return rc;
+}
+
 static int pfn_reader_user_pin(struct pfn_reader_user *user,
 			       struct iopt_pages *pages,
 			       unsigned long start_index,
@@ -903,7 +986,10 @@  static int pfn_reader_user_pin(struct pfn_reader_user *user,
 
 	if (user->file) {
 		start = pages->start + (start_index * PAGE_SIZE);
-		rc = pin_memfd_pages(user, start, npages);
+		if (is_guest_memfd(user->file))
+			rc = pin_guest_memfd_pages(user, start, npages, pages);
+		else
+			rc = pin_memfd_pages(user, start, npages);
 	} else if (!remote_mm) {
 		uptr = (uintptr_t)(pages->uptr + start_index * PAGE_SIZE);
 		rc = pin_user_pages_fast(uptr, npages, user->gup_flags,

[RFC,v2,12/22] iommufd: Allow mapping from guest_memfd

Commit Message

Comments

Patch