[v2,0/3] vfio-pci: Block user access to disabled device MMIO

Message ID	158871401328.15589.17598154478222071285.stgit@gimli.home (mailing list archive)
Headers	show Return-Path: <SRS0=tDSI=6T=vger.kernel.org=kvm-owner@kernel.org> Subject: [PATCH v2 0/3] vfio-pci: Block user access to disabled device MMIO From: Alex Williamson <alex.williamson@redhat.com> To: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org, cohuck@redhat.com, jgg@ziepe.ca Date: Tue, 05 May 2020 15:54:36 -0600 Message-ID: <158871401328.15589.17598154478222071285.stgit@gimli.home> User-Agent: StGit/0.19-dirty MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Sender: kvm-owner@vger.kernel.org Precedence: bulk
Series	vfio-pci: Block user access to disabled device MMIO \| expand [v2,0/3] vfio-pci: Block user access to disabled device MMIO [v2,1/3] vfio/type1: Support faulting PFNMAP vmas [v2,2/3] vfio-pci: Fault mmaps to enable vma tracking [v2,3/3] vfio-pci: Invalidate mmaps and block MMIO access on disabled memory

Message ID

158871401328.15589.17598154478222071285.stgit@gimli.home (mailing list archive)

Headers

Subject: [PATCH v2 0/3] vfio-pci: Block user access to disabled device MMIO
From: Alex Williamson <alex.williamson@redhat.com>
To: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org, cohuck@redhat.com, jgg@ziepe.ca
Date: Tue, 05 May 2020 15:54:36 -0600
Message-ID: <158871401328.15589.17598154478222071285.stgit@gimli.home>
User-Agent: StGit/0.19-dirty
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 7bit
Sender: kvm-owner@vger.kernel.org
Precedence: bulk

Series

vfio-pci: Block user access to disabled device MMIO | expand

Message

Alex Williamson May 5, 2020, 9:54 p.m. UTC

v2:

Locking in 3/ is substantially changed to avoid the retry scenario
within the fault handler, therefore a caller who does not allow retry
will no longer receive a SIGBUS on contention.  IOMMU invalidations
are still not included here, I expect that will be a future follow-on
change as we're not fundamentally changing that issue in this series.
The 'add to vma list only on fault' behavior is also still included
here, per the discussion I think it's still a valid approach and has
some advantages, particularly in a VM scenario where we potentially
defer the mapping until the MMIO BAR is actually DMA mapped into the
VM address space (or the guest driver actually accesses the device
if that DMA mapping is eliminated at some point).  Further discussion
and review appreciated.  Thanks,

Alex

v1:

Add tracking of the device memory enable bit and block/fault accesses
to device MMIO space while disabled.  This provides synchronous fault
handling for CPU accesses to the device and prevents the user from
triggering platform level error handling present on some systems.
Device reset and MSI-X vector table accesses are also included such
that access is blocked across reset and vector table accesses do not
depend on the user configuration of the device.

This is based on the vfio for-linus branch currently in next, making
use of follow_pfn() in vaddr_get_pfn() and therefore requiring patch
1/ to force the user fault in the case that a PFNMAP vma might be
DMA mapped before user access.  Further PFNMAP iommu invalidation
tracking is not yet included here.

As noted in the comments, I'm copying quite a bit of the logic from
rdma code for performing the zap_vma_ptes() calls and I'm also
attempting to resolve lock ordering issues in the fault handler to
lockdep's satisfaction.  I appreciate extra eyes on these sections in
particular.

I expect this to be functionally equivalent for any well behaved
userspace driver, but obviously there is a potential for the user to
get -EIO or SIGBUS on device access.  The device is provided to the
user enabled and device resets will restore the command register, so
by my evaluation a user would need to explicitly disable the memory
enable bit to trigger these faults.  We could potentially remap vmas
to a zero page rather than SIGBUS if we experience regressions, but
without known code requiring that, SIGBUS seems the appropriate
response to this condition.  Thanks,

Alex

---

Alex Williamson (3):
      vfio/type1: Support faulting PFNMAP vmas
      vfio-pci: Fault mmaps to enable vma tracking
      vfio-pci: Invalidate mmaps and block MMIO access on disabled memory


 drivers/vfio/pci/vfio_pci.c         |  321 +++++++++++++++++++++++++++++++++--
 drivers/vfio/pci/vfio_pci_config.c  |   36 +++-
 drivers/vfio/pci/vfio_pci_intrs.c   |   18 ++
 drivers/vfio/pci/vfio_pci_private.h |   12 +
 drivers/vfio/pci/vfio_pci_rdwr.c    |   12 +
 drivers/vfio/vfio_iommu_type1.c     |   36 ++++
 6 files changed, 405 insertions(+), 30 deletions(-)

Comments

Peter Xu May 7, 2020, 9:59 p.m. UTC | #1

On Tue, May 05, 2020 at 03:54:36PM -0600, Alex Williamson wrote:
> v2:
> 
> Locking in 3/ is substantially changed to avoid the retry scenario
> within the fault handler, therefore a caller who does not allow retry
> will no longer receive a SIGBUS on contention.  IOMMU invalidations
> are still not included here, I expect that will be a future follow-on
> change as we're not fundamentally changing that issue in this series.
> The 'add to vma list only on fault' behavior is also still included
> here, per the discussion I think it's still a valid approach and has
> some advantages, particularly in a VM scenario where we potentially
> defer the mapping until the MMIO BAR is actually DMA mapped into the
> VM address space (or the guest driver actually accesses the device
> if that DMA mapping is eliminated at some point).  Further discussion
> and review appreciated.  Thanks,

Hi, Alex,

I have a general question on the series.

IIUC this series tries to protect illegal vfio userspace writes to device MMIO
regions which may cause platform-level issues.  That makes perfect sense to me.
However what if the write comes from the devices' side?  E.g.:

  - Device A maps MMIO region X

  - Device B do VFIO_IOMMU_DMA_MAP on Device A's MMIO region X
    (so X's MMIO PFNs are mapped in device B's IOMMU page table)

  - Device A clears PCI_COMMAND_MEMORY (reset, etc.)
    - this should zap all existing vmas that mapping region X, however device
      B's IOMMU page table is not aware of this?

  - Device B writes to MMIO region X of device A even if PCI_COMMAND_MEMORY
    cleared on device A's PCI_COMMAND register

Could this happen?

Thanks,

Alex Williamson May 7, 2020, 10:34 p.m. UTC | #2

On Thu, 7 May 2020 17:59:08 -0400
Peter Xu <peterx@redhat.com> wrote:

> On Tue, May 05, 2020 at 03:54:36PM -0600, Alex Williamson wrote:
> > v2:
> > 
> > Locking in 3/ is substantially changed to avoid the retry scenario
> > within the fault handler, therefore a caller who does not allow retry
> > will no longer receive a SIGBUS on contention.  IOMMU invalidations
> > are still not included here, I expect that will be a future follow-on
> > change as we're not fundamentally changing that issue in this series.
> > The 'add to vma list only on fault' behavior is also still included
> > here, per the discussion I think it's still a valid approach and has
> > some advantages, particularly in a VM scenario where we potentially
> > defer the mapping until the MMIO BAR is actually DMA mapped into the
> > VM address space (or the guest driver actually accesses the device
> > if that DMA mapping is eliminated at some point).  Further discussion
> > and review appreciated.  Thanks,  
> 
> Hi, Alex,
> 
> I have a general question on the series.
> 
> IIUC this series tries to protect illegal vfio userspace writes to device MMIO
> regions which may cause platform-level issues.  That makes perfect sense to me.
> However what if the write comes from the devices' side?  E.g.:
> 
>   - Device A maps MMIO region X
> 
>   - Device B do VFIO_IOMMU_DMA_MAP on Device A's MMIO region X
>     (so X's MMIO PFNs are mapped in device B's IOMMU page table)
> 
>   - Device A clears PCI_COMMAND_MEMORY (reset, etc.)
>     - this should zap all existing vmas that mapping region X, however device
>       B's IOMMU page table is not aware of this?
> 
>   - Device B writes to MMIO region X of device A even if PCI_COMMAND_MEMORY
>     cleared on device A's PCI_COMMAND register
> 
> Could this happen?

Yes, this can happen and Jason has brought up variations on this
scenario that are important to fix as well.  I've got some ideas, but
the access in this series was the current priority.  There are also
issues in the above scenario that if a platform considers a DMA write
to an invalid IOMMU PTE and triggering an IOMMU fault to have the same
severity as the write to disabled MMIO space we've prevented, then our
hands are tied.  Thanks,

Alex

Peter Xu May 8, 2020, 2:31 a.m. UTC | #3

On Thu, May 07, 2020 at 04:34:37PM -0600, Alex Williamson wrote:
> On Thu, 7 May 2020 17:59:08 -0400
> Peter Xu <peterx@redhat.com> wrote:
> 
> > On Tue, May 05, 2020 at 03:54:36PM -0600, Alex Williamson wrote:
> > > v2:
> > > 
> > > Locking in 3/ is substantially changed to avoid the retry scenario
> > > within the fault handler, therefore a caller who does not allow retry
> > > will no longer receive a SIGBUS on contention.  IOMMU invalidations
> > > are still not included here, I expect that will be a future follow-on
> > > change as we're not fundamentally changing that issue in this series.
> > > The 'add to vma list only on fault' behavior is also still included
> > > here, per the discussion I think it's still a valid approach and has
> > > some advantages, particularly in a VM scenario where we potentially
> > > defer the mapping until the MMIO BAR is actually DMA mapped into the
> > > VM address space (or the guest driver actually accesses the device
> > > if that DMA mapping is eliminated at some point).  Further discussion
> > > and review appreciated.  Thanks,  
> > 
> > Hi, Alex,
> > 
> > I have a general question on the series.
> > 
> > IIUC this series tries to protect illegal vfio userspace writes to device MMIO
> > regions which may cause platform-level issues.  That makes perfect sense to me.
> > However what if the write comes from the devices' side?  E.g.:
> > 
> >   - Device A maps MMIO region X
> > 
> >   - Device B do VFIO_IOMMU_DMA_MAP on Device A's MMIO region X
> >     (so X's MMIO PFNs are mapped in device B's IOMMU page table)
> > 
> >   - Device A clears PCI_COMMAND_MEMORY (reset, etc.)
> >     - this should zap all existing vmas that mapping region X, however device
> >       B's IOMMU page table is not aware of this?
> > 
> >   - Device B writes to MMIO region X of device A even if PCI_COMMAND_MEMORY
> >     cleared on device A's PCI_COMMAND register
> > 
> > Could this happen?
> 
> Yes, this can happen and Jason has brought up variations on this
> scenario that are important to fix as well.  I've got some ideas, but
> the access in this series was the current priority.  There are also
> issues in the above scenario that if a platform considers a DMA write
> to an invalid IOMMU PTE and triggering an IOMMU fault to have the same
> severity as the write to disabled MMIO space we've prevented, then our
> hands are tied.  Thanks,

I see the point now; it makes sense to start with a series like this. Thanks, Alex.