mbox series

[RFCv1,00/14] Add Tegra241 (Grace) CMDQV Support (part 2/2)

Message ID cover.1712978212.git.nicolinc@nvidia.com (mailing list archive)
Headers show
Series Add Tegra241 (Grace) CMDQV Support (part 2/2) | expand

Message

Nicolin Chen April 13, 2024, 3:46 a.m. UTC
This is an experimental RFC series for VIOMMU infrastructure, using NVIDIA
Tegra241 (Grace) CMDQV as a test instance.

VIOMMU obj is used to represent a virtual interface (iommu) backed by an
underlying IOMMU's HW-accelerated feature for virtualizaion: for example,
NVIDIA's VINTF (v-interface for CMDQV) and AMD"s vIOMMU.

VQUEUE obj is used to represent a virtual command queue (buffer) backed by
an underlying IOMMU command queue to passthrough for VMs to use directly:
for example, NVIDIA's Virtual Command Queue and AMD's Command Buffer.

NVIDIA's CMDQV requires a pair of physical and virtual device Stream IDs
to process ATC invalidation commands by ARM SMMU. So, set/unset_dev_id ops
and ioctls are introduced to VIOMMU.

Also, a passthrough queue has a pair of start and tail pointers/indexes in
the real HW registers, which should be mmaped to user space for hypervisor
to map to VM's mmio region directly. Thus, iommufd needs an mmap op too.

Some todos/opens:
1. Add selftest coverages for new ioctls
2. The mmap needs a way to get viommu_id. Currently it's getting from
   vma->vm_pgoff, which might not be ideal.
3. This series is only verified with a single passthrough device that's
   hehind a physical ARM SMMU. So, devices behind two+ IOMMUs might need
   some additional support (and verifications).
4. Requires for comments from AMD folks to support AMD's vIOMMU feature.

This series is on Github (for review and reference only):
https://github.com/nicolinc/iommufd/commits/vcmdq_user_space-rfc-v1

Real HW tests wre conducted with this QEMU branch:
https://github.com/nicolinc/qemu/commits/wip/iommufd_vcmdq/

Thanks

Nicolin Chen (14):
  iommufd: Move iommufd_object to public iommufd header
  iommufd: Swap _iommufd_object_alloc and __iommufd_object_alloc
  iommufd: Prepare for viommu structures and functions
  iommufd: Add struct iommufd_viommu and iommufd_viommu_ops
  iommufd: Add IOMMUFD_OBJ_VIOMMU and IOMMUFD_CMD_VIOMMU_ALLOC
  iommufd/selftest: Add IOMMU_VIOMMU_ALLOC test coverage
  iommufd: Add viommu set/unset_dev_id ops
  iommufd: Add IOMMU_VIOMMU_SET_DEV_ID ioctl
  iommufd/selftest: Add IOMMU_VIOMMU_SET_DEV_ID test coverage
  iommufd/selftest: Add IOMMU_TEST_OP_MV_CHECK_DEV_ID
  iommufd: Add struct iommufd_vqueue and its related viommu ops
  iommufd: Add IOMMUFD_OBJ_VQUEUE and IOMMUFD_CMD_VQUEUE_ALLOC
  iommufd: Add mmap infrastructure
  iommu/tegra241-cmdqv: Add user-space use support

 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c   |  19 ++
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h   |  19 ++
 .../iommu/arm/arm-smmu-v3/tegra241-cmdqv.c    | 284 +++++++++++++++++-
 drivers/iommu/iommufd/Makefile                |   3 +-
 drivers/iommu/iommufd/device.c                |  11 +
 drivers/iommu/iommufd/hw_pagetable.c          |   4 +-
 drivers/iommu/iommufd/iommufd_private.h       |  71 +++--
 drivers/iommu/iommufd/iommufd_test.h          |   5 +
 drivers/iommu/iommufd/main.c                  |  69 ++++-
 drivers/iommu/iommufd/selftest.c              | 100 ++++++
 drivers/iommu/iommufd/viommu.c                | 235 +++++++++++++++
 include/linux/iommu.h                         |  16 +
 include/linux/iommufd.h                       | 100 ++++++
 include/uapi/linux/iommufd.h                  |  98 ++++++
 tools/testing/selftests/iommu/iommufd.c       |  44 +++
 tools/testing/selftests/iommu/iommufd_utils.h |  71 +++++
 16 files changed, 1103 insertions(+), 46 deletions(-)
 create mode 100644 drivers/iommu/iommufd/viommu.c

Comments

Tian, Kevin May 22, 2024, 8:40 a.m. UTC | #1
> From: Nicolin Chen <nicolinc@nvidia.com>
> Sent: Saturday, April 13, 2024 11:47 AM
> 
> This is an experimental RFC series for VIOMMU infrastructure, using NVIDIA
> Tegra241 (Grace) CMDQV as a test instance.
> 
> VIOMMU obj is used to represent a virtual interface (iommu) backed by an
> underlying IOMMU's HW-accelerated feature for virtualizaion: for example,
> NVIDIA's VINTF (v-interface for CMDQV) and AMD"s vIOMMU.
> 
> VQUEUE obj is used to represent a virtual command queue (buffer) backed
> by
> an underlying IOMMU command queue to passthrough for VMs to use
> directly:
> for example, NVIDIA's Virtual Command Queue and AMD's Command Buffer.
> 

is VCMDQ more accurate? AMD also supports fault queue passthrough
then VQUEUE sounds broader than a cmd queue...
Jason Gunthorpe May 22, 2024, 4:48 p.m. UTC | #2
On Wed, May 22, 2024 at 08:40:00AM +0000, Tian, Kevin wrote:
> > From: Nicolin Chen <nicolinc@nvidia.com>
> > Sent: Saturday, April 13, 2024 11:47 AM
> > 
> > This is an experimental RFC series for VIOMMU infrastructure, using NVIDIA
> > Tegra241 (Grace) CMDQV as a test instance.
> > 
> > VIOMMU obj is used to represent a virtual interface (iommu) backed by an
> > underlying IOMMU's HW-accelerated feature for virtualizaion: for example,
> > NVIDIA's VINTF (v-interface for CMDQV) and AMD"s vIOMMU.
> > 
> > VQUEUE obj is used to represent a virtual command queue (buffer) backed
> > by
> > an underlying IOMMU command queue to passthrough for VMs to use
> > directly:
> > for example, NVIDIA's Virtual Command Queue and AMD's Command Buffer.
> > 
> 
> is VCMDQ more accurate? AMD also supports fault queue passthrough
> then VQUEUE sounds broader than a cmd queue...

Is there a reason VQUEUE couldn't handle the fault/etc queues too? The
only difference is direction, there is still a doorbell/etc.

Jason
Nicolin Chen May 22, 2024, 7:47 p.m. UTC | #3
On Wed, May 22, 2024 at 01:48:18PM -0300, Jason Gunthorpe wrote:
> On Wed, May 22, 2024 at 08:40:00AM +0000, Tian, Kevin wrote:
> > > From: Nicolin Chen <nicolinc@nvidia.com>
> > > Sent: Saturday, April 13, 2024 11:47 AM
> > > 
> > > This is an experimental RFC series for VIOMMU infrastructure, using NVIDIA
> > > Tegra241 (Grace) CMDQV as a test instance.
> > > 
> > > VIOMMU obj is used to represent a virtual interface (iommu) backed by an
> > > underlying IOMMU's HW-accelerated feature for virtualizaion: for example,
> > > NVIDIA's VINTF (v-interface for CMDQV) and AMD"s vIOMMU.
> > > 
> > > VQUEUE obj is used to represent a virtual command queue (buffer) backed
> > > by
> > > an underlying IOMMU command queue to passthrough for VMs to use
> > > directly:
> > > for example, NVIDIA's Virtual Command Queue and AMD's Command Buffer.
> > > 
> > 
> > is VCMDQ more accurate? AMD also supports fault queue passthrough
> > then VQUEUE sounds broader than a cmd queue...
> 
> Is there a reason VQUEUE couldn't handle the fault/etc queues too? The
> only difference is direction, there is still a doorbell/etc.

Yea, SMMU also has Event Queue and PRI queue. Though I haven't
got time to sit down to look at Baolu's work closely, the uAPI
seems to be a unified one for all IOMMUs. And though I have no
intention to be against that design, yet maybe there could be
an alternative in a somewhat HW specific language as we do for
invalidation? Or not worth it?

Thanks
Nicolin
Jason Gunthorpe May 22, 2024, 11:28 p.m. UTC | #4
On Wed, May 22, 2024 at 12:47:19PM -0700, Nicolin Chen wrote:
> On Wed, May 22, 2024 at 01:48:18PM -0300, Jason Gunthorpe wrote:
> > On Wed, May 22, 2024 at 08:40:00AM +0000, Tian, Kevin wrote:
> > > > From: Nicolin Chen <nicolinc@nvidia.com>
> > > > Sent: Saturday, April 13, 2024 11:47 AM
> > > > 
> > > > This is an experimental RFC series for VIOMMU infrastructure, using NVIDIA
> > > > Tegra241 (Grace) CMDQV as a test instance.
> > > > 
> > > > VIOMMU obj is used to represent a virtual interface (iommu) backed by an
> > > > underlying IOMMU's HW-accelerated feature for virtualizaion: for example,
> > > > NVIDIA's VINTF (v-interface for CMDQV) and AMD"s vIOMMU.
> > > > 
> > > > VQUEUE obj is used to represent a virtual command queue (buffer) backed
> > > > by
> > > > an underlying IOMMU command queue to passthrough for VMs to use
> > > > directly:
> > > > for example, NVIDIA's Virtual Command Queue and AMD's Command Buffer.
> > > > 
> > > 
> > > is VCMDQ more accurate? AMD also supports fault queue passthrough
> > > then VQUEUE sounds broader than a cmd queue...
> > 
> > Is there a reason VQUEUE couldn't handle the fault/etc queues too? The
> > only difference is direction, there is still a doorbell/etc.
> 
> Yea, SMMU also has Event Queue and PRI queue. Though I haven't
> got time to sit down to look at Baolu's work closely, the uAPI
> seems to be a unified one for all IOMMUs. And though I have no
> intention to be against that design, yet maybe there could be
> an alternative in a somewhat HW specific language as we do for
> invalidation? Or not worth it?

I was thinking not worth it, I expect a gain here is to do as AMD has
done and make the HW dma the queues directly to guest memory.

IMHO the primary issue with the queues is DOS, as having any shared
queue across VMs is dangerous in that way. Allowing each VIOMMU to
have its own private queue and own flow control helps with that.

Jason
Tian, Kevin May 22, 2024, 11:43 p.m. UTC | #5
> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Thursday, May 23, 2024 7:29 AM
> 
> On Wed, May 22, 2024 at 12:47:19PM -0700, Nicolin Chen wrote:
> > On Wed, May 22, 2024 at 01:48:18PM -0300, Jason Gunthorpe wrote:
> > > On Wed, May 22, 2024 at 08:40:00AM +0000, Tian, Kevin wrote:
> > > > > From: Nicolin Chen <nicolinc@nvidia.com>
> > > > > Sent: Saturday, April 13, 2024 11:47 AM
> > > > >
> > > > > This is an experimental RFC series for VIOMMU infrastructure, using
> NVIDIA
> > > > > Tegra241 (Grace) CMDQV as a test instance.
> > > > >
> > > > > VIOMMU obj is used to represent a virtual interface (iommu) backed
> by an
> > > > > underlying IOMMU's HW-accelerated feature for virtualizaion: for
> example,
> > > > > NVIDIA's VINTF (v-interface for CMDQV) and AMD"s vIOMMU.
> > > > >
> > > > > VQUEUE obj is used to represent a virtual command queue (buffer)
> backed
> > > > > by
> > > > > an underlying IOMMU command queue to passthrough for VMs to
> use
> > > > > directly:
> > > > > for example, NVIDIA's Virtual Command Queue and AMD's Command
> Buffer.
> > > > >
> > > >
> > > > is VCMDQ more accurate? AMD also supports fault queue passthrough
> > > > then VQUEUE sounds broader than a cmd queue...
> > >
> > > Is there a reason VQUEUE couldn't handle the fault/etc queues too? The
> > > only difference is direction, there is still a doorbell/etc.

No reason. the description made it specific to a cmd queue which
led me the impression that we may want to create a separate
fault queue.

> >
> > Yea, SMMU also has Event Queue and PRI queue. Though I haven't
> > got time to sit down to look at Baolu's work closely, the uAPI
> > seems to be a unified one for all IOMMUs. And though I have no
> > intention to be against that design, yet maybe there could be
> > an alternative in a somewhat HW specific language as we do for
> > invalidation? Or not worth it?
> 
> I was thinking not worth it, I expect a gain here is to do as AMD has
> done and make the HW dma the queues directly to guest memory.
> 
> IMHO the primary issue with the queues is DOS, as having any shared
> queue across VMs is dangerous in that way. Allowing each VIOMMU to
> have its own private queue and own flow control helps with that.
> 

and also shorter delivering path with less data copy?
Nicolin Chen May 23, 2024, 3:09 a.m. UTC | #6
On Wed, May 22, 2024 at 11:43:51PM +0000, Tian, Kevin wrote:
> > From: Jason Gunthorpe <jgg@nvidia.com>
> > Sent: Thursday, May 23, 2024 7:29 AM
> > On Wed, May 22, 2024 at 12:47:19PM -0700, Nicolin Chen wrote:
> > > Yea, SMMU also has Event Queue and PRI queue. Though I haven't
> > > got time to sit down to look at Baolu's work closely, the uAPI
> > > seems to be a unified one for all IOMMUs. And though I have no
> > > intention to be against that design, yet maybe there could be
> > > an alternative in a somewhat HW specific language as we do for
> > > invalidation? Or not worth it?
> >
> > I was thinking not worth it, I expect a gain here is to do as AMD has
> > done and make the HW dma the queues directly to guest memory.
> >
> > IMHO the primary issue with the queues is DOS, as having any shared
> > queue across VMs is dangerous in that way. Allowing each VIOMMU to
> > have its own private queue and own flow control helps with that.
> >
> 
> and also shorter delivering path with less data copy?

Should I interpret that as a yes for fault report via VQUEUE?

We only have AMD that can HW dma the events to the guest queue
memory. Others all need a backward translation of (at least) a
physical dev ID to a virtual dev ID. This is now doable in the
kernel by the ongoing vdev_id design by the way. So kernel then
can write the guest memory directly to report events?

Thanks
Nicolin
Jason Gunthorpe May 23, 2024, 12:48 p.m. UTC | #7
On Wed, May 22, 2024 at 08:09:12PM -0700, Nicolin Chen wrote:
> On Wed, May 22, 2024 at 11:43:51PM +0000, Tian, Kevin wrote:
> > > From: Jason Gunthorpe <jgg@nvidia.com>
> > > Sent: Thursday, May 23, 2024 7:29 AM
> > > On Wed, May 22, 2024 at 12:47:19PM -0700, Nicolin Chen wrote:
> > > > Yea, SMMU also has Event Queue and PRI queue. Though I haven't
> > > > got time to sit down to look at Baolu's work closely, the uAPI
> > > > seems to be a unified one for all IOMMUs. And though I have no
> > > > intention to be against that design, yet maybe there could be
> > > > an alternative in a somewhat HW specific language as we do for
> > > > invalidation? Or not worth it?
> > >
> > > I was thinking not worth it, I expect a gain here is to do as AMD has
> > > done and make the HW dma the queues directly to guest memory.
> > >
> > > IMHO the primary issue with the queues is DOS, as having any shared
> > > queue across VMs is dangerous in that way. Allowing each VIOMMU to
> > > have its own private queue and own flow control helps with that.
> > >
> > 
> > and also shorter delivering path with less data copy?
> 
> Should I interpret that as a yes for fault report via VQUEUE?
> 
> We only have AMD that can HW dma the events to the guest queue
> memory. Others all need a backward translation of (at least) a
> physical dev ID to a virtual dev ID. This is now doable in the
> kernel by the ongoing vdev_id design by the way. So kernel then
> can write the guest memory directly to report events?

I don't think we should get into kernel doing direct access at this
point, lets focus on basic functionality before we get to
microoptimizations like that.

So long as the API could support doing something like that it could be
done after benchmarking/etc.

Jason