mbox series

[v3,0/5] Support message-based DMA in vfio-user server

Message ID 20230907130410.498935-1-mnissler@rivosinc.com (mailing list archive)
Headers show
Series Support message-based DMA in vfio-user server | expand

Message

Mattias Nissler Sept. 7, 2023, 1:04 p.m. UTC
This series adds basic support for message-based DMA in qemu's vfio-user
server. This is useful for cases where the client does not provide file
descriptors for accessing system memory via memory mappings. My motivating use
case is to hook up device models as PCIe endpoints to a hardware design. This
works by bridging the PCIe transaction layer to vfio-user, and the endpoint
does not access memory directly, but sends memory requests TLPs to the hardware
design in order to perform DMA.

Note that there is some more work required on top of this series to get
message-based DMA to really work well:

* libvfio-user has a long-standing issue where socket communication gets messed
  up when messages are sent from both ends at the same time. See
  https://github.com/nutanix/libvfio-user/issues/279 for more details. I've
  been engaging there and a fix is in review.

* qemu currently breaks down DMA accesses into chunks of size 8 bytes at
  maximum, each of which will be handled in a separate vfio-user DMA request
  message. This is quite terrible for large DMA accesses, such as when nvme
  reads and writes page-sized blocks for example. Thus, I would like to improve
  qemu to be able to perform larger accesses, at least for indirect memory
  regions. I have something working locally, but since this will likely result
  in more involved surgery and discussion, I am leaving this to be addressed in
  a separate patch.

Changes from v1:

* Address Stefan's review comments. In particular, enforce an allocation limit
  and don't drop the map client callbacks given that map requests can fail when
  hitting size limits.

* libvfio-user version bump now included in the series.

* Tested as well on big-endian s390x. This uncovered another byte order issue
  in vfio-user server code that I've included a fix for.

Changes from v2:

* Add a preparatory patch to make bounce buffering an AddressSpace-specific
  concept.

* The total buffer size limit parameter is now per AdressSpace and can be
  configured for PCIDevice via a property.

* Store a magic value in first bytes of bounce buffer struct as a best effort
  measure to detect invalid pointers in address_space_unmap.

Mattias Nissler (5):
  softmmu: Per-AddressSpace bounce buffering
  softmmu: Support concurrent bounce buffers
  Update subprojects/libvfio-user
  vfio-user: Message-based DMA support
  vfio-user: Fix config space access byte order

 hw/pci/pci.c                  |   8 ++
 hw/remote/trace-events        |   2 +
 hw/remote/vfio-user-obj.c     |  88 +++++++++++++++++--
 include/exec/cpu-common.h     |   2 -
 include/exec/memory.h         |  39 ++++++++-
 include/hw/pci/pci_device.h   |   3 +
 softmmu/dma-helpers.c         |   4 +-
 softmmu/memory.c              |   4 +
 softmmu/physmem.c             | 155 ++++++++++++++++++----------------
 subprojects/libvfio-user.wrap |   2 +-
 10 files changed, 220 insertions(+), 87 deletions(-)

Comments

Stefan Hajnoczi Sept. 14, 2023, 2:39 p.m. UTC | #1
On Thu, Sep 07, 2023 at 06:04:05AM -0700, Mattias Nissler wrote:
> This series adds basic support for message-based DMA in qemu's vfio-user
> server. This is useful for cases where the client does not provide file
> descriptors for accessing system memory via memory mappings. My motivating use
> case is to hook up device models as PCIe endpoints to a hardware design. This
> works by bridging the PCIe transaction layer to vfio-user, and the endpoint
> does not access memory directly, but sends memory requests TLPs to the hardware
> design in order to perform DMA.
> 
> Note that there is some more work required on top of this series to get
> message-based DMA to really work well:
> 
> * libvfio-user has a long-standing issue where socket communication gets messed
>   up when messages are sent from both ends at the same time. See
>   https://github.com/nutanix/libvfio-user/issues/279 for more details. I've
>   been engaging there and a fix is in review.
> 
> * qemu currently breaks down DMA accesses into chunks of size 8 bytes at
>   maximum, each of which will be handled in a separate vfio-user DMA request
>   message. This is quite terrible for large DMA accesses, such as when nvme
>   reads and writes page-sized blocks for example. Thus, I would like to improve
>   qemu to be able to perform larger accesses, at least for indirect memory
>   regions. I have something working locally, but since this will likely result
>   in more involved surgery and discussion, I am leaving this to be addressed in
>   a separate patch.

Have you tried setting mr->ops->valid.max_access_size to something like
64 KB?

Paolo: Any suggestions for increasing DMA transaction sizes?

Stefan

> 
> Changes from v1:
> 
> * Address Stefan's review comments. In particular, enforce an allocation limit
>   and don't drop the map client callbacks given that map requests can fail when
>   hitting size limits.
> 
> * libvfio-user version bump now included in the series.
> 
> * Tested as well on big-endian s390x. This uncovered another byte order issue
>   in vfio-user server code that I've included a fix for.
> 
> Changes from v2:
> 
> * Add a preparatory patch to make bounce buffering an AddressSpace-specific
>   concept.
> 
> * The total buffer size limit parameter is now per AdressSpace and can be
>   configured for PCIDevice via a property.
> 
> * Store a magic value in first bytes of bounce buffer struct as a best effort
>   measure to detect invalid pointers in address_space_unmap.
> 
> Mattias Nissler (5):
>   softmmu: Per-AddressSpace bounce buffering
>   softmmu: Support concurrent bounce buffers
>   Update subprojects/libvfio-user
>   vfio-user: Message-based DMA support
>   vfio-user: Fix config space access byte order
> 
>  hw/pci/pci.c                  |   8 ++
>  hw/remote/trace-events        |   2 +
>  hw/remote/vfio-user-obj.c     |  88 +++++++++++++++++--
>  include/exec/cpu-common.h     |   2 -
>  include/exec/memory.h         |  39 ++++++++-
>  include/hw/pci/pci_device.h   |   3 +
>  softmmu/dma-helpers.c         |   4 +-
>  softmmu/memory.c              |   4 +
>  softmmu/physmem.c             | 155 ++++++++++++++++++----------------
>  subprojects/libvfio-user.wrap |   2 +-
>  10 files changed, 220 insertions(+), 87 deletions(-)
> 
> -- 
> 2.34.1
>
Mattias Nissler Sept. 15, 2023, 8:23 a.m. UTC | #2
On Thu, Sep 14, 2023 at 4:39 PM Stefan Hajnoczi <stefanha@redhat.com> wrote:
>
> On Thu, Sep 07, 2023 at 06:04:05AM -0700, Mattias Nissler wrote:
> > This series adds basic support for message-based DMA in qemu's vfio-user
> > server. This is useful for cases where the client does not provide file
> > descriptors for accessing system memory via memory mappings. My motivating use
> > case is to hook up device models as PCIe endpoints to a hardware design. This
> > works by bridging the PCIe transaction layer to vfio-user, and the endpoint
> > does not access memory directly, but sends memory requests TLPs to the hardware
> > design in order to perform DMA.
> >
> > Note that there is some more work required on top of this series to get
> > message-based DMA to really work well:
> >
> > * libvfio-user has a long-standing issue where socket communication gets messed
> >   up when messages are sent from both ends at the same time. See
> >   https://github.com/nutanix/libvfio-user/issues/279 for more details. I've
> >   been engaging there and a fix is in review.
> >
> > * qemu currently breaks down DMA accesses into chunks of size 8 bytes at
> >   maximum, each of which will be handled in a separate vfio-user DMA request
> >   message. This is quite terrible for large DMA accesses, such as when nvme
> >   reads and writes page-sized blocks for example. Thus, I would like to improve
> >   qemu to be able to perform larger accesses, at least for indirect memory
> >   regions. I have something working locally, but since this will likely result
> >   in more involved surgery and discussion, I am leaving this to be addressed in
> >   a separate patch.
>
> Have you tried setting mr->ops->valid.max_access_size to something like
> 64 KB?

I had tried that early on, but it's not that easy unfortunately. The
memory access path eventually hits flatview_read_continue [1], where
memory_region_dispatch_read gets invoked which passes data in a single
uint64_t, which is also the unit of data that MemoryRegionOps operates
on. Thus, sizeof(uint64_t) is the current hard limit when accessing an
indirect memory region. I have some proof of concept code that extends
MemoryRegionOps with functions to read and write larger blocks, and
change the dispatching code to use these if available. I'm not sure
whether that's the right way to go though, it was just what jumped out
at me as a quick way to get what I need :-) Happy to share this code
if it helps the conversation.

There are certainly various considerations with this:
* It crossed my mind that we could introduce a separate memory region
type (I understand that indirect memory regions were originally
designed for I/O regions, accessed by the CPU, and thus naturally
limited to memop-sized accesses?). But then again perhaps we want
arbitrarily-sized accesses for potentially all memory regions, not
just those of special types?
* If we do decide to add support to MemoryRegionOps for
arbitrarily-sized accesses, that raises the question on whether this
is a 3rd, optional pair of accessors in addition to read/write and
read_with_attrs/write_with_attrs, or whether MemoryRegionOps deserves
a cleanup to expose only a single pair of arbitrarily-size accessors.
Then we'd adapt them somehow to the simpler memop-sized accessors
which existing code implements, and I think makes sense to keep for
cases where this is sufficient.
* Performance - need to keep an eye on what performance implications
these design decisions come with.

[1] https://github.com/qemu/qemu/blob/master/softmmu/physmem.c#L2744

>
> Paolo: Any suggestions for increasing DMA transaction sizes?
>
> Stefan
>
> >
> > Changes from v1:
> >
> > * Address Stefan's review comments. In particular, enforce an allocation limit
> >   and don't drop the map client callbacks given that map requests can fail when
> >   hitting size limits.
> >
> > * libvfio-user version bump now included in the series.
> >
> > * Tested as well on big-endian s390x. This uncovered another byte order issue
> >   in vfio-user server code that I've included a fix for.
> >
> > Changes from v2:
> >
> > * Add a preparatory patch to make bounce buffering an AddressSpace-specific
> >   concept.
> >
> > * The total buffer size limit parameter is now per AdressSpace and can be
> >   configured for PCIDevice via a property.
> >
> > * Store a magic value in first bytes of bounce buffer struct as a best effort
> >   measure to detect invalid pointers in address_space_unmap.
> >
> > Mattias Nissler (5):
> >   softmmu: Per-AddressSpace bounce buffering
> >   softmmu: Support concurrent bounce buffers
> >   Update subprojects/libvfio-user
> >   vfio-user: Message-based DMA support
> >   vfio-user: Fix config space access byte order
> >
> >  hw/pci/pci.c                  |   8 ++
> >  hw/remote/trace-events        |   2 +
> >  hw/remote/vfio-user-obj.c     |  88 +++++++++++++++++--
> >  include/exec/cpu-common.h     |   2 -
> >  include/exec/memory.h         |  39 ++++++++-
> >  include/hw/pci/pci_device.h   |   3 +
> >  softmmu/dma-helpers.c         |   4 +-
> >  softmmu/memory.c              |   4 +
> >  softmmu/physmem.c             | 155 ++++++++++++++++++----------------
> >  subprojects/libvfio-user.wrap |   2 +-
> >  10 files changed, 220 insertions(+), 87 deletions(-)
> >
> > --
> > 2.34.1
> >