mbox series

[0/3] vfio: Device memory DMA mapping improvements

Message ID 161315658638.7320.9686203003395567745.stgit@gimli.home (mailing list archive)
Headers show
Series vfio: Device memory DMA mapping improvements | expand

Message

Alex Williamson Feb. 12, 2021, 7:27 p.m. UTC
This series intends to improve some long standing issues with mapping
device memory through the vfio IOMMU interface (ie. P2P DMA mappings).
Unlike mapping DMA to RAM, we can't pin device memory, nor is it
always accessible.  We attempt to tackle this (predominantly the
first issue in this iteration) by creating a registration and
notification interface through vfio-core, between the IOMMU backend
and the bus driver.  This allows us to do things like automatically
remove a DMA mapping to device if it's closed by the user.  We also
keep references to the device container such that it remains isolated
while this mapping exists.

Unlike my previous attempt[1], this version works across containers.
For example if a user has device1 with IOMMU context in container1
and device2 in container2, a mapping of device2 memory into container1
IOMMU context would be removed when device2 is released.

What I don't tackle here is when device memory is disabled, such as
for a PCI device when the command register memory bit is cleared or
while the device is in reset.  Ideally is seems like it might be
nice to have IOMMU API interfaces that could remove r/w permissions
from the IOTLB entry w/o removing it entirely, but I'm also unsure
of the ultimate value in modifying the IOTLB entries at this point.

In the PCI example again, I'd expect a DMA to disabled or unavailable
device memory to get an Unsupported Request response.  If we play
with the IOTLB mapping, we might change this to an IOMMU fault for
either page permissions or page not present, depending on how we
choose to invalidate that entry.  However, it seems that a system that
escalates an UR error to fatal, through things like firmware first
handling, is just as likely to also make the IOMMU fault fatal.  Are
there cases where we expect otherwise, and if not is there value to
tracking device memory enable state to that extent in the IOMMU?

Jason, I'm also curious if this scratches your itch relative to your
suggestion to solve this with dma-bufs, and if that's still your
preference, I'd love an outline to accomplish this same with that
method.

Thanks,
Alex

[1]https://lore.kernel.org/kvm/158947414729.12590.4345248265094886807.stgit@gimli.home/

---

Alex Williamson (3):
      vfio: Introduce vma ops registration and notifier
      vfio/pci: Implement vm_ops registration
      vfio/type1: Implement vma registration and restriction


 drivers/vfio/pci/Kconfig            |    1 
 drivers/vfio/pci/vfio_pci.c         |   87 ++++++++++++++++
 drivers/vfio/pci/vfio_pci_private.h |    1 
 drivers/vfio/vfio.c                 |  120 ++++++++++++++++++++++
 drivers/vfio/vfio_iommu_type1.c     |  192 ++++++++++++++++++++++++++++-------
 include/linux/vfio.h                |   20 ++++
 6 files changed, 384 insertions(+), 37 deletions(-)

Comments

Jason Gunthorpe Feb. 12, 2021, 8:57 p.m. UTC | #1
On Fri, Feb 12, 2021 at 12:27:19PM -0700, Alex Williamson wrote:
> This series intends to improve some long standing issues with mapping
> device memory through the vfio IOMMU interface (ie. P2P DMA mappings).
> Unlike mapping DMA to RAM, we can't pin device memory, nor is it
> always accessible.  We attempt to tackle this (predominantly the
> first issue in this iteration) by creating a registration and
> notification interface through vfio-core, between the IOMMU backend
> and the bus driver.  This allows us to do things like automatically
> remove a DMA mapping to device if it's closed by the user.  We also
> keep references to the device container such that it remains isolated
> while this mapping exists.
> 
> Unlike my previous attempt[1], this version works across containers.
> For example if a user has device1 with IOMMU context in container1
> and device2 in container2, a mapping of device2 memory into container1
> IOMMU context would be removed when device2 is released.
> 
> What I don't tackle here is when device memory is disabled, such as
> for a PCI device when the command register memory bit is cleared or
> while the device is in reset.  Ideally is seems like it might be
> nice to have IOMMU API interfaces that could remove r/w permissions
> from the IOTLB entry w/o removing it entirely, but I'm also unsure
> of the ultimate value in modifying the IOTLB entries at this point.
> 
> In the PCI example again, I'd expect a DMA to disabled or unavailable
> device memory to get an Unsupported Request response.  If we play
> with the IOTLB mapping, we might change this to an IOMMU fault for
> either page permissions or page not present, depending on how we
> choose to invalidate that entry.  However, it seems that a system that
> escalates an UR error to fatal, through things like firmware first
> handling, is just as likely to also make the IOMMU fault fatal.  Are
> there cases where we expect otherwise, and if not is there value to
> tracking device memory enable state to that extent in the IOMMU?
> 
> Jason, I'm also curious if this scratches your itch relative to your
> suggestion to solve this with dma-bufs, and if that's still your
> preference, I'd love an outline to accomplish this same with that
> method.

I will look at this more closely later, but given this is solving a
significant security problem and the patches now exist, I'm not
inclined to push too hard to do something different if this works OK.

That said, it is not great to see VFIO create its own little dmabuf
like thing inside itself, in particular if this was in core code we
could add a new vm_operations_struct member like:

   struct dmabuf (*getdma_buf)(struct vm_operations_struct *area);

And completely avoid a lot of the searching and fiddling with
ops. Maybe we can make this look closer to that ideal..

Jason