diff mbox series

[RFC,1/3] powerpc/pseries/iommu: Bring back userspace view for single level TCE tables

Message ID 171026725393.8367.17497620074051138306.stgit@linux.ibm.com (mailing list archive)
State New, archived
Headers show
Series powerpc: pSeries: vfio: iommu: Re-enable support for SPAPR TCE VFIO | expand

Commit Message

Shivaprasad G Bhat March 12, 2024, 6:14 p.m. UTC
The commit 090bad39b237a ("powerpc/powernv: Add indirect levels to
it_userspace") which implemented the tce indirect levels
support for PowerNV ended up removing the single level support
which existed by default(generic tce_iommu_userspace_view_alloc/free()
calls). On pSeries the TCEs are single level, and the allocation
of userspace view is lost with the removal of generic code.

The patch attempts to bring it back for pseries on the refactored
code base.

On pSeries, the windows/tables are "borrowed", so the it_ops->free()
is not called during the container detach or the tce release call paths
as the table is not really freed. So, decoupling the userspace view
array free and alloc from table's it_ops just the way it was before.

Signed-off-by: Shivaprasad G Bhat <sbhat@linux.ibm.com>
---
 arch/powerpc/platforms/pseries/iommu.c |   19 ++++++++++--
 drivers/vfio/vfio_iommu_spapr_tce.c    |   51 ++++++++++++++++++++++++++++++++
 2 files changed, 67 insertions(+), 3 deletions(-)

Comments

Jason Gunthorpe March 19, 2024, 2:32 p.m. UTC | #1
On Tue, Mar 12, 2024 at 01:14:20PM -0500, Shivaprasad G Bhat wrote:
> The commit 090bad39b237a ("powerpc/powernv: Add indirect levels to
> it_userspace") which implemented the tce indirect levels
> support for PowerNV ended up removing the single level support
> which existed by default(generic tce_iommu_userspace_view_alloc/free()
> calls). On pSeries the TCEs are single level, and the allocation
> of userspace view is lost with the removal of generic code.

:( :(

If this has been broken since 2018 and nobody cared till now can we
please go in a direction of moving this code to the new iommu APIs
instead of doubling down on more of this old stuff that apparently
almost nobody cares about ??

Jason
Timothy Pearson March 19, 2024, 6:36 p.m. UTC | #2
----- Original Message -----
> From: "Jason Gunthorpe" <jgg@ziepe.ca>
> To: "Shivaprasad G Bhat" <sbhat@linux.ibm.com>
> Cc: "Timothy Pearson" <tpearson@raptorengineering.com>, "Alex Williamson" <alex.williamson@redhat.com>, "linuxppc-dev"
> <linuxppc-dev@lists.ozlabs.org>, "Michael Ellerman" <mpe@ellerman.id.au>, "npiggin" <npiggin@gmail.com>, "christophe
> leroy" <christophe.leroy@csgroup.eu>, "aneesh kumar" <aneesh.kumar@kernel.org>, "naveen n rao"
> <naveen.n.rao@linux.ibm.com>, "gbatra" <gbatra@linux.vnet.ibm.com>, brking@linux.vnet.ibm.com, "Alexey Kardashevskiy"
> <aik@ozlabs.ru>, robh@kernel.org, "linux-kernel" <linux-kernel@vger.kernel.org>, "kvm" <kvm@vger.kernel.org>, "aik"
> <aik@amd.com>, msuchanek@suse.de, "jroedel" <jroedel@suse.de>, "vaibhav" <vaibhav@linux.ibm.com>, svaidy@linux.ibm.com
> Sent: Tuesday, March 19, 2024 9:32:02 AM
> Subject: Re: [RFC PATCH 1/3] powerpc/pseries/iommu: Bring back userspace view for single level TCE tables

> On Tue, Mar 12, 2024 at 01:14:20PM -0500, Shivaprasad G Bhat wrote:
>> The commit 090bad39b237a ("powerpc/powernv: Add indirect levels to
>> it_userspace") which implemented the tce indirect levels
>> support for PowerNV ended up removing the single level support
>> which existed by default(generic tce_iommu_userspace_view_alloc/free()
>> calls). On pSeries the TCEs are single level, and the allocation
>> of userspace view is lost with the removal of generic code.
> 
> :( :(
> 
> If this has been broken since 2018 and nobody cared till now can we
> please go in a direction of moving this code to the new iommu APIs
> instead of doubling down on more of this old stuff that apparently
> almost nobody cares about ??
> 
> Jason

Just FYI Raptor is working on porting things over to the new APIs.  RFC patches should be posted in the next week or two.
Shivaprasad G Bhat March 20, 2024, 3:29 p.m. UTC | #3
Hi Jason,

On 3/19/24 20:02, Jason Gunthorpe wrote:
> On Tue, Mar 12, 2024 at 01:14:20PM -0500, Shivaprasad G Bhat wrote:
>> The commit 090bad39b237a ("powerpc/powernv: Add indirect levels to
>> it_userspace") which implemented the tce indirect levels
>> support for PowerNV ended up removing the single level support
>> which existed by default(generic tce_iommu_userspace_view_alloc/free()
>> calls). On pSeries the TCEs are single level, and the allocation
>> of userspace view is lost with the removal of generic code.
> :( :(
>
> If this has been broken since 2018 and nobody cared till now can we
> please go in a direction of moving this code to the new iommu APIs
> instead of doubling down on more of this old stuff that apparently
> almost nobody cares about ??

We have existing software stack deployments using VFIO userspace
device assignment running on Power platform. We have to enable
similar software stack on newer generation Power10 platform and
also in a pSeries lpar environment. These distros rely on VFIO enabled
in kernel and currently have IOMMUFD disabled. This patch series is
a simpler low risk enablement that functionally get the software stack
working while we continue to enable and move to IOMMUFD in phases.
We have to fix the older APIs in order to stage the functional enablement
in small increments.

We are working on iommufd support for pSeries and looking forward
to Timothy's patches.


-Thanks

Shivaprasad

> Jason
Michael Ellerman March 22, 2024, 5:49 a.m. UTC | #4
Jason Gunthorpe <jgg@ziepe.ca> writes:
> On Tue, Mar 12, 2024 at 01:14:20PM -0500, Shivaprasad G Bhat wrote:
>> The commit 090bad39b237a ("powerpc/powernv: Add indirect levels to
>> it_userspace") which implemented the tce indirect levels
>> support for PowerNV ended up removing the single level support
>> which existed by default(generic tce_iommu_userspace_view_alloc/free()
>> calls). On pSeries the TCEs are single level, and the allocation
>> of userspace view is lost with the removal of generic code.
>
> :( :(
>
> If this has been broken since 2018 and nobody cared till now can we
> please go in a direction of moving this code to the new iommu APIs
> instead of doubling down on more of this old stuff that apparently
> almost nobody cares about ??

It's broken *on pseries* (Linux as a guest), but it works fine on
powernv (aka bare metal, aka Linux as Hypervisor).

What's changed is folks are now testing it on pseries with Linux as a
nested hypervisor.

cheers
Jason Gunthorpe Oct. 3, 2024, 1:38 p.m. UTC | #5
On Tue, Mar 19, 2024 at 01:36:51PM -0500, Timothy Pearson wrote:
> > On Tue, Mar 12, 2024 at 01:14:20PM -0500, Shivaprasad G Bhat wrote:
> >> The commit 090bad39b237a ("powerpc/powernv: Add indirect levels to
> >> it_userspace") which implemented the tce indirect levels
> >> support for PowerNV ended up removing the single level support
> >> which existed by default(generic tce_iommu_userspace_view_alloc/free()
> >> calls). On pSeries the TCEs are single level, and the allocation
> >> of userspace view is lost with the removal of generic code.
> > 
> > :( :(
> > 
> > If this has been broken since 2018 and nobody cared till now can we
> > please go in a direction of moving this code to the new iommu APIs
> > instead of doubling down on more of this old stuff that apparently
> > almost nobody cares about ??
> 
> Just FYI Raptor is working on porting things over to the new APIs.
> RFC patches should be posted in the next week or two.

There was a discussion about this at LPC a few weeks ago, did any
patches get prepared?

Jason
diff mbox series

Patch

diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
index e8c4129697b1..40de8d55faef 100644
--- a/arch/powerpc/platforms/pseries/iommu.c
+++ b/arch/powerpc/platforms/pseries/iommu.c
@@ -143,7 +143,7 @@  static int tce_build_pSeries(struct iommu_table *tbl, long index,
 }
 
 
-static void tce_free_pSeries(struct iommu_table *tbl, long index, long npages)
+static void tce_clear_pSeries(struct iommu_table *tbl, long index, long npages)
 {
 	__be64 *tcep;
 
@@ -162,6 +162,11 @@  static unsigned long tce_get_pseries(struct iommu_table *tbl, long index)
 	return be64_to_cpu(*tcep);
 }
 
+static void tce_free_pSeries(struct iommu_table *tbl)
+{
+	/* Do nothing. */
+}
+
 static void tce_free_pSeriesLP(unsigned long liobn, long, long, long);
 static void tce_freemulti_pSeriesLP(struct iommu_table*, long, long);
 
@@ -576,7 +581,7 @@  struct iommu_table_ops iommu_table_lpar_multi_ops;
 
 struct iommu_table_ops iommu_table_pseries_ops = {
 	.set = tce_build_pSeries,
-	.clear = tce_free_pSeries,
+	.clear = tce_clear_pSeries,
 	.get = tce_get_pseries
 };
 
@@ -685,15 +690,23 @@  static int tce_exchange_pseries(struct iommu_table *tbl, long index, unsigned
 
 	return rc;
 }
+
+static __be64 *tce_useraddr_pSeriesLP(struct iommu_table *tbl, long index,
+				      bool __always_unused alloc)
+{
+	return tbl->it_userspace ? &tbl->it_userspace[index - tbl->it_offset] : NULL;
+}
 #endif
 
 struct iommu_table_ops iommu_table_lpar_multi_ops = {
 	.set = tce_buildmulti_pSeriesLP,
 #ifdef CONFIG_IOMMU_API
 	.xchg_no_kill = tce_exchange_pseries,
+	.useraddrptr = tce_useraddr_pSeriesLP,
 #endif
 	.clear = tce_freemulti_pSeriesLP,
-	.get = tce_get_pSeriesLP
+	.get = tce_get_pSeriesLP,
+	.free = tce_free_pSeries
 };
 
 /*
diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c b/drivers/vfio/vfio_iommu_spapr_tce.c
index a94ec6225d31..1cf36d687559 100644
--- a/drivers/vfio/vfio_iommu_spapr_tce.c
+++ b/drivers/vfio/vfio_iommu_spapr_tce.c
@@ -177,6 +177,50 @@  static long tce_iommu_register_pages(struct tce_container *container,
 	return ret;
 }
 
+static long tce_iommu_userspace_view_alloc(struct iommu_table *tbl,
+		struct mm_struct *mm)
+{
+	unsigned long cb = ALIGN(sizeof(tbl->it_userspace[0]) *
+			tbl->it_size, PAGE_SIZE);
+	unsigned long *uas;
+	long ret;
+
+	if (tbl->it_indirect_levels)
+		return 0;
+
+	WARN_ON(tbl->it_userspace);
+
+	ret = account_locked_vm(mm, cb >> PAGE_SHIFT, true);
+	if (ret)
+		return ret;
+
+	uas = vzalloc(cb);
+	if (!uas) {
+		account_locked_vm(mm, cb >> PAGE_SHIFT, false);
+		return -ENOMEM;
+	}
+	tbl->it_userspace = (__be64 *) uas;
+
+	return 0;
+}
+
+static void tce_iommu_userspace_view_free(struct iommu_table *tbl,
+		struct mm_struct *mm)
+{
+	unsigned long cb = ALIGN(sizeof(tbl->it_userspace[0]) *
+			tbl->it_size, PAGE_SIZE);
+
+	if (!tbl->it_userspace)
+		return;
+
+	if (tbl->it_indirect_levels)
+		return;
+
+	vfree(tbl->it_userspace);
+	tbl->it_userspace = NULL;
+	account_locked_vm(mm, cb >> PAGE_SHIFT, false);
+}
+
 static bool tce_page_is_contained(struct mm_struct *mm, unsigned long hpa,
 		unsigned int it_page_shift)
 {
@@ -554,6 +598,12 @@  static long tce_iommu_build_v2(struct tce_container *container,
 	unsigned long hpa;
 	enum dma_data_direction dirtmp;
 
+	if (!tbl->it_userspace) {
+		ret = tce_iommu_userspace_view_alloc(tbl, container->mm);
+		if (ret)
+			return ret;
+	}
+
 	for (i = 0; i < pages; ++i) {
 		struct mm_iommu_table_group_mem_t *mem = NULL;
 		__be64 *pua = IOMMU_TABLE_USERSPACE_ENTRY(tbl, entry + i);
@@ -637,6 +687,7 @@  static void tce_iommu_free_table(struct tce_container *container,
 {
 	unsigned long pages = tbl->it_allocated_size >> PAGE_SHIFT;
 
+	tce_iommu_userspace_view_free(tbl, container->mm);
 	iommu_tce_table_put(tbl);
 	account_locked_vm(container->mm, pages, false);
 }