diff mbox series

[RFC,v2,1/5] vhost-user: Add VIRTIO Shared Memory map request

Message ID 20240628145710.1516121-2-aesteve@redhat.com (mailing list archive)
State New, archived
Headers show
Series vhost-user: Add SHMEM_MAP/UNMAP requests | expand

Commit Message

Albert Esteve June 28, 2024, 2:57 p.m. UTC
Add SHMEM_MAP/UNMAP requests to vhost-user to
handle VIRTIO Shared Memory mappings.

This request allows backends to dynamically map
fds into a VIRTIO Shared Memory Region indentified
by its `shmid`. Then, the fd memory is advertised
to the driver as a base addres + offset, so it
can be read/written (depending on the mmap flags
requested) while its valid.

The backend can munmap the memory range
in a given VIRTIO Shared Memory Region (again,
identified by its `shmid`), to free it. Upon
receiving this message, the front-end must
mmap the regions with PROT_NONE to reserve
the virtual memory space.

The device model needs to create MemoryRegion
instances for the VIRTIO Shared Memory Regions
and add them to the `VirtIODevice` instance.

Signed-off-by: Albert Esteve <aesteve@redhat.com>
---
 docs/interop/vhost-user.rst               |  27 +++++
 hw/virtio/vhost-user.c                    | 122 ++++++++++++++++++++++
 hw/virtio/virtio.c                        |  12 +++
 include/hw/virtio/virtio.h                |   5 +
 subprojects/libvhost-user/libvhost-user.c |  65 ++++++++++++
 subprojects/libvhost-user/libvhost-user.h |  53 ++++++++++
 6 files changed, 284 insertions(+)

Comments

Stefan Hajnoczi July 11, 2024, 7:45 a.m. UTC | #1
On Fri, Jun 28, 2024 at 04:57:06PM +0200, Albert Esteve wrote:
> Add SHMEM_MAP/UNMAP requests to vhost-user to
> handle VIRTIO Shared Memory mappings.
> 
> This request allows backends to dynamically map
> fds into a VIRTIO Shared Memory Region indentified
> by its `shmid`. Then, the fd memory is advertised
> to the driver as a base addres + offset, so it
> can be read/written (depending on the mmap flags
> requested) while its valid.
> 
> The backend can munmap the memory range
> in a given VIRTIO Shared Memory Region (again,
> identified by its `shmid`), to free it. Upon
> receiving this message, the front-end must
> mmap the regions with PROT_NONE to reserve
> the virtual memory space.
> 
> The device model needs to create MemoryRegion
> instances for the VIRTIO Shared Memory Regions
> and add them to the `VirtIODevice` instance.
> 
> Signed-off-by: Albert Esteve <aesteve@redhat.com>
> ---
>  docs/interop/vhost-user.rst               |  27 +++++
>  hw/virtio/vhost-user.c                    | 122 ++++++++++++++++++++++
>  hw/virtio/virtio.c                        |  12 +++
>  include/hw/virtio/virtio.h                |   5 +
>  subprojects/libvhost-user/libvhost-user.c |  65 ++++++++++++
>  subprojects/libvhost-user/libvhost-user.h |  53 ++++++++++
>  6 files changed, 284 insertions(+)
> 
> diff --git a/docs/interop/vhost-user.rst b/docs/interop/vhost-user.rst
> index d8419fd2f1..d52ba719d5 100644
> --- a/docs/interop/vhost-user.rst
> +++ b/docs/interop/vhost-user.rst
> @@ -1859,6 +1859,33 @@ is sent by the front-end.
>    when the operation is successful, or non-zero otherwise. Note that if the
>    operation fails, no fd is sent to the backend.
>  
> +``VHOST_USER_BACKEND_SHMEM_MAP``
> +  :id: 9
> +  :equivalent ioctl: N/A
> +  :request payload: fd and ``struct VhostUserMMap``
> +  :reply payload: N/A
> +
> +  This message can be submitted by the backends to advertise a new mapping
> +  to be made in a given VIRTIO Shared Memory Region. Upon receiving the message,
> +  The front-end will mmap the given fd into the VIRTIO Shared Memory Region
> +  with the requested ``shmid``. A reply is generated indicating whether mapping
> +  succeeded.
> +
> +  Mapping over an already existing map is not allowed and request shall fail.
> +  Therefore, the memory range in the request must correspond with a valid,
> +  free region of the VIRTIO Shared Memory Region.
> +
> +``VHOST_USER_BACKEND_SHMEM_UNMAP``
> +  :id: 10
> +  :equivalent ioctl: N/A
> +  :request payload: ``struct VhostUserMMap``
> +  :reply payload: N/A
> +
> +  This message can be submitted by the backends so that the front-end un-mmap
> +  a given range (``offset``, ``len``) in the VIRTIO Shared Memory Region with

s/offset/shm_offset/

> +  the requested ``shmid``.

Please clarify that <offset, len> must correspond to the entirety of a
valid mapped region.

By the way, the VIRTIO 1.3 gives the following behavior for the virtiofs
DAX Window:

  When a FUSE_SETUPMAPPING request perfectly overlaps a previous
  mapping, the previous mapping is replaced. When a mapping partially
  overlaps a previous mapping, the previous mapping is split into one or
  two smaller mappings. When a mapping is partially unmapped it is also
  split into one or two smaller mappings.

  Establishing new mappings or splitting existing mappings consumes
  resources. If the device runs out of resources the FUSE_SETUPMAPPING
  request fails until resources are available again following
  FUSE_REMOVEMAPPING.

I think SETUPMAPPING/REMOVMAPPING can be implemented using
SHMEM_MAP/UNMAP. SHMEM_MAP/UNMAP do not allow atomically replacing
partial ranges, but as far as I know that's not necessary for virtiofs
in practice.

It's worth mentioning that mappings consume resources and that SHMEM_MAP
can fail when there are no resources available. The process-wide limit
is vm.max_map_count on Linux although a vhost-user frontend may reduce
it further to control vhost-user resource usage.

> +  A reply is generated indicating whether unmapping succeeded.
> +
>  .. _reply_ack:
>  
>  VHOST_USER_PROTOCOL_F_REPLY_ACK
> diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> index cdf9af4a4b..7ee8a472c6 100644
> --- a/hw/virtio/vhost-user.c
> +++ b/hw/virtio/vhost-user.c
> @@ -115,6 +115,8 @@ typedef enum VhostUserBackendRequest {
>      VHOST_USER_BACKEND_SHARED_OBJECT_ADD = 6,
>      VHOST_USER_BACKEND_SHARED_OBJECT_REMOVE = 7,
>      VHOST_USER_BACKEND_SHARED_OBJECT_LOOKUP = 8,
> +    VHOST_USER_BACKEND_SHMEM_MAP = 9,
> +    VHOST_USER_BACKEND_SHMEM_UNMAP = 10,
>      VHOST_USER_BACKEND_MAX
>  }  VhostUserBackendRequest;
>  
> @@ -192,6 +194,24 @@ typedef struct VhostUserShared {
>      unsigned char uuid[16];
>  } VhostUserShared;
>  
> +/* For the flags field of VhostUserMMap */
> +#define VHOST_USER_FLAG_MAP_R (1u << 0)
> +#define VHOST_USER_FLAG_MAP_W (1u << 1)
> +
> +typedef struct {
> +    /* VIRTIO Shared Memory Region ID */
> +    uint8_t shmid;
> +    uint8_t padding[7];
> +    /* File offset */
> +    uint64_t fd_offset;
> +    /* Offset within the VIRTIO Shared Memory Region */
> +    uint64_t shm_offset;
> +    /* Size of the mapping */
> +    uint64_t len;
> +    /* Flags for the mmap operation, from VHOST_USER_FLAG_* */
> +    uint64_t flags;
> +} VhostUserMMap;
> +
>  typedef struct {
>      VhostUserRequest request;
>  
> @@ -224,6 +244,7 @@ typedef union {
>          VhostUserInflight inflight;
>          VhostUserShared object;
>          VhostUserTransferDeviceState transfer_state;
> +        VhostUserMMap mmap;
>  } VhostUserPayload;
>  
>  typedef struct VhostUserMsg {
> @@ -1748,6 +1769,100 @@ vhost_user_backend_handle_shared_object_lookup(struct vhost_user *u,
>      return 0;
>  }
>  
> +static int
> +vhost_user_backend_handle_shmem_map(struct vhost_dev *dev,
> +                                    VhostUserMMap *vu_mmap,
> +                                    int fd)
> +{
> +    void *addr = 0;
> +    MemoryRegion *mr = NULL;
> +
> +    if (fd < 0) {
> +        error_report("Bad fd for map");
> +        return -EBADF;
> +    }
> +
> +    if (!dev->vdev->shmem_list ||
> +        dev->vdev->n_shmem_regions <= vu_mmap->shmid) {
> +        error_report("Device only has %d VIRTIO Shared Memory Regions. "
> +                     "Requested ID: %d",
> +                     dev->vdev->n_shmem_regions, vu_mmap->shmid);
> +        return -EFAULT;
> +    }
> +
> +    mr = &dev->vdev->shmem_list[vu_mmap->shmid];
> +
> +    if (!mr) {
> +        error_report("VIRTIO Shared Memory Region at "
> +                     "ID %d unitialized", vu_mmap->shmid);
> +        return -EFAULT;
> +    }
> +
> +    if ((vu_mmap->shm_offset + vu_mmap->len) < vu_mmap->len ||
> +        (vu_mmap->shm_offset + vu_mmap->len) > mr->size) {
> +        error_report("Bad offset/len for mmap %" PRIx64 "+%" PRIx64,
> +                     vu_mmap->shm_offset, vu_mmap->len);
> +        return -EFAULT;
> +    }
> +
> +    void *shmem_ptr = memory_region_get_ram_ptr(mr);
> +
> +    addr = mmap(shmem_ptr + vu_mmap->shm_offset, vu_mmap->len,

Missing check for overlap between range [shm_offset, shm_offset + len)
and existing mappings.

> +        ((vu_mmap->flags & VHOST_USER_FLAG_MAP_R) ? PROT_READ : 0) |
> +        ((vu_mmap->flags & VHOST_USER_FLAG_MAP_W) ? PROT_WRITE : 0),
> +        MAP_SHARED | MAP_FIXED, fd, vu_mmap->fd_offset);
> +
> +    if (addr == MAP_FAILED) {
> +        error_report("Failed to mmap mem fd");
> +        return -EFAULT;
> +    }
> +
> +    return 0;
> +}
> +
> +static int
> +vhost_user_backend_handle_shmem_unmap(struct vhost_dev *dev,
> +                                      VhostUserMMap *vu_mmap)
> +{
> +    void *addr = 0;
> +    MemoryRegion *mr = NULL;
> +
> +    if (!dev->vdev->shmem_list ||
> +        dev->vdev->n_shmem_regions <= vu_mmap->shmid) {
> +        error_report("Device only has %d VIRTIO Shared Memory Regions. "
> +                     "Requested ID: %d",
> +                     dev->vdev->n_shmem_regions, vu_mmap->shmid);
> +        return -EFAULT;
> +    }
> +
> +    mr = &dev->vdev->shmem_list[vu_mmap->shmid];
> +
> +    if (!mr) {
> +        error_report("VIRTIO Shared Memory Region at "
> +                     "ID %d unitialized", vu_mmap->shmid);
> +        return -EFAULT;
> +    }
> +
> +    if ((vu_mmap->shm_offset + vu_mmap->len) < vu_mmap->len ||
> +        (vu_mmap->shm_offset + vu_mmap->len) > mr->size) {
> +        error_report("Bad offset/len for mmap %" PRIx64 "+%" PRIx64,
> +                     vu_mmap->shm_offset, vu_mmap->len);
> +        return -EFAULT;
> +    }
> +
> +    void *shmem_ptr = memory_region_get_ram_ptr(mr);
> +
> +    addr = mmap(shmem_ptr + vu_mmap->shm_offset, vu_mmap->len,

Missing check for existing mapping with exact range [shm_offset, len)
match.

> +                PROT_NONE, MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, -1, 0);
> +
> +    if (addr == MAP_FAILED) {
> +        error_report("Failed to unmap memory");
> +        return -EFAULT;
> +    }
> +
> +    return 0;
> +}
> +
>  static void close_backend_channel(struct vhost_user *u)
>  {
>      g_source_destroy(u->backend_src);
> @@ -1816,6 +1931,13 @@ static gboolean backend_read(QIOChannel *ioc, GIOCondition condition,
>          ret = vhost_user_backend_handle_shared_object_lookup(dev->opaque, ioc,
>                                                               &hdr, &payload);
>          break;
> +    case VHOST_USER_BACKEND_SHMEM_MAP:
> +        ret = vhost_user_backend_handle_shmem_map(dev, &payload.mmap,
> +                                                  fd ? fd[0] : -1);
> +        break;
> +    case VHOST_USER_BACKEND_SHMEM_UNMAP:
> +        ret = vhost_user_backend_handle_shmem_unmap(dev, &payload.mmap);
> +        break;
>      default:
>          error_report("Received unexpected msg type: %d.", hdr.request);
>          ret = -EINVAL;
> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> index 893a072c9d..9f2da5b11e 100644
> --- a/hw/virtio/virtio.c
> +++ b/hw/virtio/virtio.c
> @@ -2856,6 +2856,16 @@ int virtio_save(VirtIODevice *vdev, QEMUFile *f)
>      return vmstate_save_state(f, &vmstate_virtio, vdev, NULL);
>  }
>  
> +MemoryRegion *virtio_new_shmem_region(VirtIODevice *vdev)
> +{
> +    MemoryRegion *mr = g_new0(MemoryRegion, 1);
> +    ++vdev->n_shmem_regions;
> +    vdev->shmem_list = g_renew(MemoryRegion, vdev->shmem_list,
> +                               vdev->n_shmem_regions);

Where is shmem_list freed?

The name "list" is misleading since this is an array, not a list.

> +    vdev->shmem_list[vdev->n_shmem_regions - 1] = *mr;
> +    return mr;
> +}

This looks weird. The contents of mr are copied into shmem_list[] and
then the pointer to mr is returned? Did you mean for the field's type to
be MemoryRegion **shmem_list and then vdev->shmem_list[...] = mr would
stash the pointer?

> +
>  /* A wrapper for use as a VMState .put function */
>  static int virtio_device_put(QEMUFile *f, void *opaque, size_t size,
>                                const VMStateField *field, JSONWriter *vmdesc)
> @@ -3264,6 +3274,8 @@ void virtio_init(VirtIODevice *vdev, uint16_t device_id, size_t config_size)
>              virtio_vmstate_change, vdev);
>      vdev->device_endian = virtio_default_endian();
>      vdev->use_guest_notifier_mask = true;
> +    vdev->shmem_list = NULL;
> +    vdev->n_shmem_regions = 0;
>  }
>  
>  /*
> diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
> index 7d5ffdc145..16d598aadc 100644
> --- a/include/hw/virtio/virtio.h
> +++ b/include/hw/virtio/virtio.h
> @@ -165,6 +165,9 @@ struct VirtIODevice
>       */
>      EventNotifier config_notifier;
>      bool device_iotlb_enabled;
> +    /* Shared memory region for vhost-user mappings. */
> +    MemoryRegion *shmem_list;
> +    int n_shmem_regions;
>  };
>  
>  struct VirtioDeviceClass {
> @@ -280,6 +283,8 @@ void virtio_notify(VirtIODevice *vdev, VirtQueue *vq);
>  
>  int virtio_save(VirtIODevice *vdev, QEMUFile *f);
>  
> +MemoryRegion *virtio_new_shmem_region(VirtIODevice *vdev);
> +
>  extern const VMStateInfo virtio_vmstate_info;
>  
>  #define VMSTATE_VIRTIO_DEVICE \
> diff --git a/subprojects/libvhost-user/libvhost-user.c b/subprojects/libvhost-user/libvhost-user.c
> index a879149fef..28556d183a 100644
> --- a/subprojects/libvhost-user/libvhost-user.c
> +++ b/subprojects/libvhost-user/libvhost-user.c
> @@ -1586,6 +1586,71 @@ vu_rm_shared_object(VuDev *dev, unsigned char uuid[UUID_LEN])
>      return vu_send_message(dev, &msg);
>  }
>  
> +bool
> +vu_shmem_map(VuDev *dev, uint8_t shmid, uint64_t fd_offset,
> +             uint64_t shm_offset, uint64_t len, uint64_t flags)
> +{
> +    bool result = false;
> +    VhostUserMsg msg_reply;
> +    VhostUserMsg vmsg = {
> +        .request = VHOST_USER_BACKEND_SHMEM_MAP,
> +        .size = sizeof(vmsg.payload.mmap),
> +        .flags = VHOST_USER_VERSION,
> +        .payload.mmap = {
> +            .shmid = shmid,
> +            .fd_offset = fd_offset,
> +            .shm_offset = shm_offset,
> +            .len = len,
> +            .flags = flags,
> +        },
> +    };
> +
> +    if (vu_has_protocol_feature(dev, VHOST_USER_PROTOCOL_F_REPLY_ACK)) {
> +        vmsg.flags |= VHOST_USER_NEED_REPLY_MASK;
> +    }
> +
> +    pthread_mutex_lock(&dev->backend_mutex);
> +    if (!vu_message_write(dev, dev->backend_fd, &vmsg)) {
> +        pthread_mutex_unlock(&dev->backend_mutex);
> +        return false;
> +    }
> +
> +    /* Also unlocks the backend_mutex */
> +    return vu_process_message_reply(dev, &vmsg);
> +}
> +
> +bool
> +vu_shmem_unmap(VuDev *dev, uint8_t shmid, uint64_t fd_offset,
> +               uint64_t shm_offset, uint64_t len)
> +{
> +    bool result = false;
> +    VhostUserMsg msg_reply;
> +    VhostUserMsg vmsg = {
> +        .request = VHOST_USER_BACKEND_SHMEM_UNMAP,
> +        .size = sizeof(vmsg.payload.mmap),
> +        .flags = VHOST_USER_VERSION,
> +        .payload.mmap = {
> +            .shmid = shmid,
> +            .fd_offset = fd_offset,

What is the meaning of this field? I expected it to be set to 0.

> +            .shm_offset = shm_offset,
> +            .len = len,
> +        },
> +    };
> +
> +    if (vu_has_protocol_feature(dev, VHOST_USER_PROTOCOL_F_REPLY_ACK)) {
> +        vmsg.flags |= VHOST_USER_NEED_REPLY_MASK;
> +    }
> +
> +    pthread_mutex_lock(&dev->backend_mutex);
> +    if (!vu_message_write(dev, dev->backend_fd, &vmsg)) {
> +        pthread_mutex_unlock(&dev->backend_mutex);
> +        return false;
> +    }
> +
> +    /* Also unlocks the backend_mutex */
> +    return vu_process_message_reply(dev, &vmsg);
> +}
> +
>  static bool
>  vu_set_vring_call_exec(VuDev *dev, VhostUserMsg *vmsg)
>  {
> diff --git a/subprojects/libvhost-user/libvhost-user.h b/subprojects/libvhost-user/libvhost-user.h
> index deb40e77b3..7f6c22cc1a 100644
> --- a/subprojects/libvhost-user/libvhost-user.h
> +++ b/subprojects/libvhost-user/libvhost-user.h
> @@ -127,6 +127,8 @@ typedef enum VhostUserBackendRequest {
>      VHOST_USER_BACKEND_SHARED_OBJECT_ADD = 6,
>      VHOST_USER_BACKEND_SHARED_OBJECT_REMOVE = 7,
>      VHOST_USER_BACKEND_SHARED_OBJECT_LOOKUP = 8,
> +    VHOST_USER_BACKEND_SHMEM_MAP = 9,
> +    VHOST_USER_BACKEND_SHMEM_UNMAP = 10,
>      VHOST_USER_BACKEND_MAX
>  }  VhostUserBackendRequest;
>  
> @@ -186,6 +188,24 @@ typedef struct VhostUserShared {
>      unsigned char uuid[UUID_LEN];
>  } VhostUserShared;
>  
> +/* For the flags field of VhostUserMMap */
> +#define VHOST_USER_FLAG_MAP_R (1u << 0)
> +#define VHOST_USER_FLAG_MAP_W (1u << 1)
> +
> +typedef struct {
> +    /* VIRTIO Shared Memory Region ID */
> +    uint8_t shmid;
> +    uint8_t padding[7];
> +    /* File offset */
> +    uint64_t fd_offset;
> +    /* Offset within the VIRTIO Shared Memory Region */
> +    uint64_t shm_offset;
> +    /* Size of the mapping */
> +    uint64_t len;
> +    /* Flags for the mmap operation, from VHOST_USER_FLAG_* */
> +    uint64_t flags;
> +} VhostUserMMap;
> +
>  #if defined(_WIN32) && (defined(__x86_64__) || defined(__i386__))
>  # define VU_PACKED __attribute__((gcc_struct, packed))
>  #else
> @@ -214,6 +234,7 @@ typedef struct VhostUserMsg {
>          VhostUserVringArea area;
>          VhostUserInflight inflight;
>          VhostUserShared object;
> +        VhostUserMMap mmap;
>      } payload;
>  
>      int fds[VHOST_MEMORY_BASELINE_NREGIONS];
> @@ -597,6 +618,38 @@ bool vu_add_shared_object(VuDev *dev, unsigned char uuid[UUID_LEN]);
>   */
>  bool vu_rm_shared_object(VuDev *dev, unsigned char uuid[UUID_LEN]);
>  
> +/**
> + * vu_shmem_map:
> + * @dev: a VuDev context
> + * @shmid: VIRTIO Shared Memory Region ID
> + * @fd_offset: File offset
> + * @shm_offset: Offset within the VIRTIO Shared Memory Region
> + * @len: Size of the mapping
> + * @flags: Flags for the mmap operation
> + *
> + * Advertises a new mapping to be made in a given VIRTIO Shared Memory Region.
> + *
> + * Returns: TRUE on success, FALSE on failure.
> + */
> +bool vu_shmem_map(VuDev *dev, uint8_t shmid, uint64_t fd_offset,
> +                  uint64_t shm_offset, uint64_t len, uint64_t flags);
> +
> +/**
> + * vu_shmem_map:
> + * @dev: a VuDev context
> + * @shmid: VIRTIO Shared Memory Region ID
> + * @fd_offset: File offset
> + * @shm_offset: Offset within the VIRTIO Shared Memory Region
> + * @len: Size of the mapping
> + *
> + * The front-end un-mmaps a given range in the VIRTIO Shared Memory Region
> + * with the requested `shmid`.
> + *
> + * Returns: TRUE on success, FALSE on failure.
> + */
> +bool vu_shmem_unmap(VuDev *dev, uint8_t shmid, uint64_t fd_offset,
> +                  uint64_t shm_offset, uint64_t len);
> +
>  /**
>   * vu_queue_set_notification:
>   * @dev: a VuDev context
> -- 
> 2.45.2
>
Albert Esteve Sept. 3, 2024, 9:54 a.m. UTC | #2
On Thu, Jul 11, 2024 at 9:45 AM Stefan Hajnoczi <stefanha@redhat.com> wrote:

> On Fri, Jun 28, 2024 at 04:57:06PM +0200, Albert Esteve wrote:
> > Add SHMEM_MAP/UNMAP requests to vhost-user to
> > handle VIRTIO Shared Memory mappings.
> >
> > This request allows backends to dynamically map
> > fds into a VIRTIO Shared Memory Region indentified
> > by its `shmid`. Then, the fd memory is advertised
> > to the driver as a base addres + offset, so it
> > can be read/written (depending on the mmap flags
> > requested) while its valid.
> >
> > The backend can munmap the memory range
> > in a given VIRTIO Shared Memory Region (again,
> > identified by its `shmid`), to free it. Upon
> > receiving this message, the front-end must
> > mmap the regions with PROT_NONE to reserve
> > the virtual memory space.
> >
> > The device model needs to create MemoryRegion
> > instances for the VIRTIO Shared Memory Regions
> > and add them to the `VirtIODevice` instance.
> >
> > Signed-off-by: Albert Esteve <aesteve@redhat.com>
> > ---
> >  docs/interop/vhost-user.rst               |  27 +++++
> >  hw/virtio/vhost-user.c                    | 122 ++++++++++++++++++++++
> >  hw/virtio/virtio.c                        |  12 +++
> >  include/hw/virtio/virtio.h                |   5 +
> >  subprojects/libvhost-user/libvhost-user.c |  65 ++++++++++++
> >  subprojects/libvhost-user/libvhost-user.h |  53 ++++++++++
> >  6 files changed, 284 insertions(+)
> >
> > diff --git a/docs/interop/vhost-user.rst b/docs/interop/vhost-user.rst
> > index d8419fd2f1..d52ba719d5 100644
> > --- a/docs/interop/vhost-user.rst
> > +++ b/docs/interop/vhost-user.rst
> > @@ -1859,6 +1859,33 @@ is sent by the front-end.
> >    when the operation is successful, or non-zero otherwise. Note that if
> the
> >    operation fails, no fd is sent to the backend.
> >
> > +``VHOST_USER_BACKEND_SHMEM_MAP``
> > +  :id: 9
> > +  :equivalent ioctl: N/A
> > +  :request payload: fd and ``struct VhostUserMMap``
> > +  :reply payload: N/A
> > +
> > +  This message can be submitted by the backends to advertise a new
> mapping
> > +  to be made in a given VIRTIO Shared Memory Region. Upon receiving the
> message,
> > +  The front-end will mmap the given fd into the VIRTIO Shared Memory
> Region
> > +  with the requested ``shmid``. A reply is generated indicating whether
> mapping
> > +  succeeded.
> > +
> > +  Mapping over an already existing map is not allowed and request shall
> fail.
> > +  Therefore, the memory range in the request must correspond with a
> valid,
> > +  free region of the VIRTIO Shared Memory Region.
> > +
> > +``VHOST_USER_BACKEND_SHMEM_UNMAP``
> > +  :id: 10
> > +  :equivalent ioctl: N/A
> > +  :request payload: ``struct VhostUserMMap``
> > +  :reply payload: N/A
> > +
> > +  This message can be submitted by the backends so that the front-end
> un-mmap
> > +  a given range (``offset``, ``len``) in the VIRTIO Shared Memory
> Region with
>
> s/offset/shm_offset/
>
> > +  the requested ``shmid``.
>
> Please clarify that <offset, len> must correspond to the entirety of a
> valid mapped region.
>
> By the way, the VIRTIO 1.3 gives the following behavior for the virtiofs
> DAX Window:
>
>   When a FUSE_SETUPMAPPING request perfectly overlaps a previous
>   mapping, the previous mapping is replaced. When a mapping partially
>   overlaps a previous mapping, the previous mapping is split into one or
>   two smaller mappings. When a mapping is partially unmapped it is also
>   split into one or two smaller mappings.
>
>   Establishing new mappings or splitting existing mappings consumes
>   resources. If the device runs out of resources the FUSE_SETUPMAPPING
>   request fails until resources are available again following
>   FUSE_REMOVEMAPPING.
>
> I think SETUPMAPPING/REMOVMAPPING can be implemented using
> SHMEM_MAP/UNMAP. SHMEM_MAP/UNMAP do not allow atomically replacing
> partial ranges, but as far as I know that's not necessary for virtiofs
> in practice.
>
> It's worth mentioning that mappings consume resources and that SHMEM_MAP
> can fail when there are no resources available. The process-wide limit
> is vm.max_map_count on Linux although a vhost-user frontend may reduce
> it further to control vhost-user resource usage.
>
> > +  A reply is generated indicating whether unmapping succeeded.
> > +
> >  .. _reply_ack:
> >
> >  VHOST_USER_PROTOCOL_F_REPLY_ACK
> > diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> > index cdf9af4a4b..7ee8a472c6 100644
> > --- a/hw/virtio/vhost-user.c
> > +++ b/hw/virtio/vhost-user.c
> > @@ -115,6 +115,8 @@ typedef enum VhostUserBackendRequest {
> >      VHOST_USER_BACKEND_SHARED_OBJECT_ADD = 6,
> >      VHOST_USER_BACKEND_SHARED_OBJECT_REMOVE = 7,
> >      VHOST_USER_BACKEND_SHARED_OBJECT_LOOKUP = 8,
> > +    VHOST_USER_BACKEND_SHMEM_MAP = 9,
> > +    VHOST_USER_BACKEND_SHMEM_UNMAP = 10,
> >      VHOST_USER_BACKEND_MAX
> >  }  VhostUserBackendRequest;
> >
> > @@ -192,6 +194,24 @@ typedef struct VhostUserShared {
> >      unsigned char uuid[16];
> >  } VhostUserShared;
> >
> > +/* For the flags field of VhostUserMMap */
> > +#define VHOST_USER_FLAG_MAP_R (1u << 0)
> > +#define VHOST_USER_FLAG_MAP_W (1u << 1)
> > +
> > +typedef struct {
> > +    /* VIRTIO Shared Memory Region ID */
> > +    uint8_t shmid;
> > +    uint8_t padding[7];
> > +    /* File offset */
> > +    uint64_t fd_offset;
> > +    /* Offset within the VIRTIO Shared Memory Region */
> > +    uint64_t shm_offset;
> > +    /* Size of the mapping */
> > +    uint64_t len;
> > +    /* Flags for the mmap operation, from VHOST_USER_FLAG_* */
> > +    uint64_t flags;
> > +} VhostUserMMap;
> > +
> >  typedef struct {
> >      VhostUserRequest request;
> >
> > @@ -224,6 +244,7 @@ typedef union {
> >          VhostUserInflight inflight;
> >          VhostUserShared object;
> >          VhostUserTransferDeviceState transfer_state;
> > +        VhostUserMMap mmap;
> >  } VhostUserPayload;
> >
> >  typedef struct VhostUserMsg {
> > @@ -1748,6 +1769,100 @@
> vhost_user_backend_handle_shared_object_lookup(struct vhost_user *u,
> >      return 0;
> >  }
> >
> > +static int
> > +vhost_user_backend_handle_shmem_map(struct vhost_dev *dev,
> > +                                    VhostUserMMap *vu_mmap,
> > +                                    int fd)
> > +{
> > +    void *addr = 0;
> > +    MemoryRegion *mr = NULL;
> > +
> > +    if (fd < 0) {
> > +        error_report("Bad fd for map");
> > +        return -EBADF;
> > +    }
> > +
> > +    if (!dev->vdev->shmem_list ||
> > +        dev->vdev->n_shmem_regions <= vu_mmap->shmid) {
> > +        error_report("Device only has %d VIRTIO Shared Memory Regions. "
> > +                     "Requested ID: %d",
> > +                     dev->vdev->n_shmem_regions, vu_mmap->shmid);
> > +        return -EFAULT;
> > +    }
> > +
> > +    mr = &dev->vdev->shmem_list[vu_mmap->shmid];
> > +
> > +    if (!mr) {
> > +        error_report("VIRTIO Shared Memory Region at "
> > +                     "ID %d unitialized", vu_mmap->shmid);
> > +        return -EFAULT;
> > +    }
> > +
> > +    if ((vu_mmap->shm_offset + vu_mmap->len) < vu_mmap->len ||
> > +        (vu_mmap->shm_offset + vu_mmap->len) > mr->size) {
> > +        error_report("Bad offset/len for mmap %" PRIx64 "+%" PRIx64,
> > +                     vu_mmap->shm_offset, vu_mmap->len);
> > +        return -EFAULT;
> > +    }
> > +
> > +    void *shmem_ptr = memory_region_get_ram_ptr(mr);
> > +
> > +    addr = mmap(shmem_ptr + vu_mmap->shm_offset, vu_mmap->len,
>
> Missing check for overlap between range [shm_offset, shm_offset + len)
> and existing mappings.
>

Not sure how to do this check. Specifically, I am not sure how previous
ranges are stored within the MemoryRegion. Is looping through mr->subregions
a valid option?


>
> > +        ((vu_mmap->flags & VHOST_USER_FLAG_MAP_R) ? PROT_READ : 0) |
> > +        ((vu_mmap->flags & VHOST_USER_FLAG_MAP_W) ? PROT_WRITE : 0),
> > +        MAP_SHARED | MAP_FIXED, fd, vu_mmap->fd_offset);
> > +
> > +    if (addr == MAP_FAILED) {
> > +        error_report("Failed to mmap mem fd");
> > +        return -EFAULT;
> > +    }
> > +
> > +    return 0;
> > +}
> > +
> > +static int
> > +vhost_user_backend_handle_shmem_unmap(struct vhost_dev *dev,
> > +                                      VhostUserMMap *vu_mmap)
> > +{
> > +    void *addr = 0;
> > +    MemoryRegion *mr = NULL;
> > +
> > +    if (!dev->vdev->shmem_list ||
> > +        dev->vdev->n_shmem_regions <= vu_mmap->shmid) {
> > +        error_report("Device only has %d VIRTIO Shared Memory Regions. "
> > +                     "Requested ID: %d",
> > +                     dev->vdev->n_shmem_regions, vu_mmap->shmid);
> > +        return -EFAULT;
> > +    }
> > +
> > +    mr = &dev->vdev->shmem_list[vu_mmap->shmid];
> > +
> > +    if (!mr) {
> > +        error_report("VIRTIO Shared Memory Region at "
> > +                     "ID %d unitialized", vu_mmap->shmid);
> > +        return -EFAULT;
> > +    }
> > +
> > +    if ((vu_mmap->shm_offset + vu_mmap->len) < vu_mmap->len ||
> > +        (vu_mmap->shm_offset + vu_mmap->len) > mr->size) {
> > +        error_report("Bad offset/len for mmap %" PRIx64 "+%" PRIx64,
> > +                     vu_mmap->shm_offset, vu_mmap->len);
> > +        return -EFAULT;
> > +    }
> > +
> > +    void *shmem_ptr = memory_region_get_ram_ptr(mr);
> > +
> > +    addr = mmap(shmem_ptr + vu_mmap->shm_offset, vu_mmap->len,
>
> Missing check for existing mapping with exact range [shm_offset, len)
> match.
>
> > +                PROT_NONE, MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, -1,
> 0);
> > +
> > +    if (addr == MAP_FAILED) {
> > +        error_report("Failed to unmap memory");
> > +        return -EFAULT;
> > +    }
> > +
> > +    return 0;
> > +}
> > +
> >  static void close_backend_channel(struct vhost_user *u)
> >  {
> >      g_source_destroy(u->backend_src);
> > @@ -1816,6 +1931,13 @@ static gboolean backend_read(QIOChannel *ioc,
> GIOCondition condition,
> >          ret =
> vhost_user_backend_handle_shared_object_lookup(dev->opaque, ioc,
> >                                                               &hdr,
> &payload);
> >          break;
> > +    case VHOST_USER_BACKEND_SHMEM_MAP:
> > +        ret = vhost_user_backend_handle_shmem_map(dev, &payload.mmap,
> > +                                                  fd ? fd[0] : -1);
> > +        break;
> > +    case VHOST_USER_BACKEND_SHMEM_UNMAP:
> > +        ret = vhost_user_backend_handle_shmem_unmap(dev, &payload.mmap);
> > +        break;
> >      default:
> >          error_report("Received unexpected msg type: %d.", hdr.request);
> >          ret = -EINVAL;
> > diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> > index 893a072c9d..9f2da5b11e 100644
> > --- a/hw/virtio/virtio.c
> > +++ b/hw/virtio/virtio.c
> > @@ -2856,6 +2856,16 @@ int virtio_save(VirtIODevice *vdev, QEMUFile *f)
> >      return vmstate_save_state(f, &vmstate_virtio, vdev, NULL);
> >  }
> >
> > +MemoryRegion *virtio_new_shmem_region(VirtIODevice *vdev)
> > +{
> > +    MemoryRegion *mr = g_new0(MemoryRegion, 1);
> > +    ++vdev->n_shmem_regions;
> > +    vdev->shmem_list = g_renew(MemoryRegion, vdev->shmem_list,
> > +                               vdev->n_shmem_regions);
>
> Where is shmem_list freed?
>
> The name "list" is misleading since this is an array, not a list.
>
> > +    vdev->shmem_list[vdev->n_shmem_regions - 1] = *mr;
> > +    return mr;
> > +}
>
> This looks weird. The contents of mr are copied into shmem_list[] and
> then the pointer to mr is returned? Did you mean for the field's type to
> be MemoryRegion **shmem_list and then vdev->shmem_list[...] = mr would
> stash the pointer?
>
> > +
> >  /* A wrapper for use as a VMState .put function */
> >  static int virtio_device_put(QEMUFile *f, void *opaque, size_t size,
> >                                const VMStateField *field, JSONWriter
> *vmdesc)
> > @@ -3264,6 +3274,8 @@ void virtio_init(VirtIODevice *vdev, uint16_t
> device_id, size_t config_size)
> >              virtio_vmstate_change, vdev);
> >      vdev->device_endian = virtio_default_endian();
> >      vdev->use_guest_notifier_mask = true;
> > +    vdev->shmem_list = NULL;
> > +    vdev->n_shmem_regions = 0;
> >  }
> >
> >  /*
> > diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
> > index 7d5ffdc145..16d598aadc 100644
> > --- a/include/hw/virtio/virtio.h
> > +++ b/include/hw/virtio/virtio.h
> > @@ -165,6 +165,9 @@ struct VirtIODevice
> >       */
> >      EventNotifier config_notifier;
> >      bool device_iotlb_enabled;
> > +    /* Shared memory region for vhost-user mappings. */
> > +    MemoryRegion *shmem_list;
> > +    int n_shmem_regions;
> >  };
> >
> >  struct VirtioDeviceClass {
> > @@ -280,6 +283,8 @@ void virtio_notify(VirtIODevice *vdev, VirtQueue
> *vq);
> >
> >  int virtio_save(VirtIODevice *vdev, QEMUFile *f);
> >
> > +MemoryRegion *virtio_new_shmem_region(VirtIODevice *vdev);
> > +
> >  extern const VMStateInfo virtio_vmstate_info;
> >
> >  #define VMSTATE_VIRTIO_DEVICE \
> > diff --git a/subprojects/libvhost-user/libvhost-user.c
> b/subprojects/libvhost-user/libvhost-user.c
> > index a879149fef..28556d183a 100644
> > --- a/subprojects/libvhost-user/libvhost-user.c
> > +++ b/subprojects/libvhost-user/libvhost-user.c
> > @@ -1586,6 +1586,71 @@ vu_rm_shared_object(VuDev *dev, unsigned char
> uuid[UUID_LEN])
> >      return vu_send_message(dev, &msg);
> >  }
> >
> > +bool
> > +vu_shmem_map(VuDev *dev, uint8_t shmid, uint64_t fd_offset,
> > +             uint64_t shm_offset, uint64_t len, uint64_t flags)
> > +{
> > +    bool result = false;
> > +    VhostUserMsg msg_reply;
> > +    VhostUserMsg vmsg = {
> > +        .request = VHOST_USER_BACKEND_SHMEM_MAP,
> > +        .size = sizeof(vmsg.payload.mmap),
> > +        .flags = VHOST_USER_VERSION,
> > +        .payload.mmap = {
> > +            .shmid = shmid,
> > +            .fd_offset = fd_offset,
> > +            .shm_offset = shm_offset,
> > +            .len = len,
> > +            .flags = flags,
> > +        },
> > +    };
> > +
> > +    if (vu_has_protocol_feature(dev, VHOST_USER_PROTOCOL_F_REPLY_ACK)) {
> > +        vmsg.flags |= VHOST_USER_NEED_REPLY_MASK;
> > +    }
> > +
> > +    pthread_mutex_lock(&dev->backend_mutex);
> > +    if (!vu_message_write(dev, dev->backend_fd, &vmsg)) {
> > +        pthread_mutex_unlock(&dev->backend_mutex);
> > +        return false;
> > +    }
> > +
> > +    /* Also unlocks the backend_mutex */
> > +    return vu_process_message_reply(dev, &vmsg);
> > +}
> > +
> > +bool
> > +vu_shmem_unmap(VuDev *dev, uint8_t shmid, uint64_t fd_offset,
> > +               uint64_t shm_offset, uint64_t len)
> > +{
> > +    bool result = false;
> > +    VhostUserMsg msg_reply;
> > +    VhostUserMsg vmsg = {
> > +        .request = VHOST_USER_BACKEND_SHMEM_UNMAP,
> > +        .size = sizeof(vmsg.payload.mmap),
> > +        .flags = VHOST_USER_VERSION,
> > +        .payload.mmap = {
> > +            .shmid = shmid,
> > +            .fd_offset = fd_offset,
>
> What is the meaning of this field? I expected it to be set to 0.
>
> > +            .shm_offset = shm_offset,
> > +            .len = len,
> > +        },
> > +    };
> > +
> > +    if (vu_has_protocol_feature(dev, VHOST_USER_PROTOCOL_F_REPLY_ACK)) {
> > +        vmsg.flags |= VHOST_USER_NEED_REPLY_MASK;
> > +    }
> > +
> > +    pthread_mutex_lock(&dev->backend_mutex);
> > +    if (!vu_message_write(dev, dev->backend_fd, &vmsg)) {
> > +        pthread_mutex_unlock(&dev->backend_mutex);
> > +        return false;
> > +    }
> > +
> > +    /* Also unlocks the backend_mutex */
> > +    return vu_process_message_reply(dev, &vmsg);
> > +}
> > +
> >  static bool
> >  vu_set_vring_call_exec(VuDev *dev, VhostUserMsg *vmsg)
> >  {
> > diff --git a/subprojects/libvhost-user/libvhost-user.h
> b/subprojects/libvhost-user/libvhost-user.h
> > index deb40e77b3..7f6c22cc1a 100644
> > --- a/subprojects/libvhost-user/libvhost-user.h
> > +++ b/subprojects/libvhost-user/libvhost-user.h
> > @@ -127,6 +127,8 @@ typedef enum VhostUserBackendRequest {
> >      VHOST_USER_BACKEND_SHARED_OBJECT_ADD = 6,
> >      VHOST_USER_BACKEND_SHARED_OBJECT_REMOVE = 7,
> >      VHOST_USER_BACKEND_SHARED_OBJECT_LOOKUP = 8,
> > +    VHOST_USER_BACKEND_SHMEM_MAP = 9,
> > +    VHOST_USER_BACKEND_SHMEM_UNMAP = 10,
> >      VHOST_USER_BACKEND_MAX
> >  }  VhostUserBackendRequest;
> >
> > @@ -186,6 +188,24 @@ typedef struct VhostUserShared {
> >      unsigned char uuid[UUID_LEN];
> >  } VhostUserShared;
> >
> > +/* For the flags field of VhostUserMMap */
> > +#define VHOST_USER_FLAG_MAP_R (1u << 0)
> > +#define VHOST_USER_FLAG_MAP_W (1u << 1)
> > +
> > +typedef struct {
> > +    /* VIRTIO Shared Memory Region ID */
> > +    uint8_t shmid;
> > +    uint8_t padding[7];
> > +    /* File offset */
> > +    uint64_t fd_offset;
> > +    /* Offset within the VIRTIO Shared Memory Region */
> > +    uint64_t shm_offset;
> > +    /* Size of the mapping */
> > +    uint64_t len;
> > +    /* Flags for the mmap operation, from VHOST_USER_FLAG_* */
> > +    uint64_t flags;
> > +} VhostUserMMap;
> > +
> >  #if defined(_WIN32) && (defined(__x86_64__) || defined(__i386__))
> >  # define VU_PACKED __attribute__((gcc_struct, packed))
> >  #else
> > @@ -214,6 +234,7 @@ typedef struct VhostUserMsg {
> >          VhostUserVringArea area;
> >          VhostUserInflight inflight;
> >          VhostUserShared object;
> > +        VhostUserMMap mmap;
> >      } payload;
> >
> >      int fds[VHOST_MEMORY_BASELINE_NREGIONS];
> > @@ -597,6 +618,38 @@ bool vu_add_shared_object(VuDev *dev, unsigned char
> uuid[UUID_LEN]);
> >   */
> >  bool vu_rm_shared_object(VuDev *dev, unsigned char uuid[UUID_LEN]);
> >
> > +/**
> > + * vu_shmem_map:
> > + * @dev: a VuDev context
> > + * @shmid: VIRTIO Shared Memory Region ID
> > + * @fd_offset: File offset
> > + * @shm_offset: Offset within the VIRTIO Shared Memory Region
> > + * @len: Size of the mapping
> > + * @flags: Flags for the mmap operation
> > + *
> > + * Advertises a new mapping to be made in a given VIRTIO Shared Memory
> Region.
> > + *
> > + * Returns: TRUE on success, FALSE on failure.
> > + */
> > +bool vu_shmem_map(VuDev *dev, uint8_t shmid, uint64_t fd_offset,
> > +                  uint64_t shm_offset, uint64_t len, uint64_t flags);
> > +
> > +/**
> > + * vu_shmem_map:
> > + * @dev: a VuDev context
> > + * @shmid: VIRTIO Shared Memory Region ID
> > + * @fd_offset: File offset
> > + * @shm_offset: Offset within the VIRTIO Shared Memory Region
> > + * @len: Size of the mapping
> > + *
> > + * The front-end un-mmaps a given range in the VIRTIO Shared Memory
> Region
> > + * with the requested `shmid`.
> > + *
> > + * Returns: TRUE on success, FALSE on failure.
> > + */
> > +bool vu_shmem_unmap(VuDev *dev, uint8_t shmid, uint64_t fd_offset,
> > +                  uint64_t shm_offset, uint64_t len);
> > +
> >  /**
> >   * vu_queue_set_notification:
> >   * @dev: a VuDev context
> > --
> > 2.45.2
> >
>
Albert Esteve Sept. 3, 2024, 11:54 a.m. UTC | #3
On Tue, Sep 3, 2024 at 11:54 AM Albert Esteve <aesteve@redhat.com> wrote:

>
>
> On Thu, Jul 11, 2024 at 9:45 AM Stefan Hajnoczi <stefanha@redhat.com>
> wrote:
>
>> On Fri, Jun 28, 2024 at 04:57:06PM +0200, Albert Esteve wrote:
>> > Add SHMEM_MAP/UNMAP requests to vhost-user to
>> > handle VIRTIO Shared Memory mappings.
>> >
>> > This request allows backends to dynamically map
>> > fds into a VIRTIO Shared Memory Region indentified
>> > by its `shmid`. Then, the fd memory is advertised
>> > to the driver as a base addres + offset, so it
>> > can be read/written (depending on the mmap flags
>> > requested) while its valid.
>> >
>> > The backend can munmap the memory range
>> > in a given VIRTIO Shared Memory Region (again,
>> > identified by its `shmid`), to free it. Upon
>> > receiving this message, the front-end must
>> > mmap the regions with PROT_NONE to reserve
>> > the virtual memory space.
>> >
>> > The device model needs to create MemoryRegion
>> > instances for the VIRTIO Shared Memory Regions
>> > and add them to the `VirtIODevice` instance.
>> >
>> > Signed-off-by: Albert Esteve <aesteve@redhat.com>
>> > ---
>> >  docs/interop/vhost-user.rst               |  27 +++++
>> >  hw/virtio/vhost-user.c                    | 122 ++++++++++++++++++++++
>> >  hw/virtio/virtio.c                        |  12 +++
>> >  include/hw/virtio/virtio.h                |   5 +
>> >  subprojects/libvhost-user/libvhost-user.c |  65 ++++++++++++
>> >  subprojects/libvhost-user/libvhost-user.h |  53 ++++++++++
>> >  6 files changed, 284 insertions(+)
>> >
>> > diff --git a/docs/interop/vhost-user.rst b/docs/interop/vhost-user.rst
>> > index d8419fd2f1..d52ba719d5 100644
>> > --- a/docs/interop/vhost-user.rst
>> > +++ b/docs/interop/vhost-user.rst
>> > @@ -1859,6 +1859,33 @@ is sent by the front-end.
>> >    when the operation is successful, or non-zero otherwise. Note that
>> if the
>> >    operation fails, no fd is sent to the backend.
>> >
>> > +``VHOST_USER_BACKEND_SHMEM_MAP``
>> > +  :id: 9
>> > +  :equivalent ioctl: N/A
>> > +  :request payload: fd and ``struct VhostUserMMap``
>> > +  :reply payload: N/A
>> > +
>> > +  This message can be submitted by the backends to advertise a new
>> mapping
>> > +  to be made in a given VIRTIO Shared Memory Region. Upon receiving
>> the message,
>> > +  The front-end will mmap the given fd into the VIRTIO Shared Memory
>> Region
>> > +  with the requested ``shmid``. A reply is generated indicating
>> whether mapping
>> > +  succeeded.
>> > +
>> > +  Mapping over an already existing map is not allowed and request
>> shall fail.
>> > +  Therefore, the memory range in the request must correspond with a
>> valid,
>> > +  free region of the VIRTIO Shared Memory Region.
>> > +
>> > +``VHOST_USER_BACKEND_SHMEM_UNMAP``
>> > +  :id: 10
>> > +  :equivalent ioctl: N/A
>> > +  :request payload: ``struct VhostUserMMap``
>> > +  :reply payload: N/A
>> > +
>> > +  This message can be submitted by the backends so that the front-end
>> un-mmap
>> > +  a given range (``offset``, ``len``) in the VIRTIO Shared Memory
>> Region with
>>
>> s/offset/shm_offset/
>>
>> > +  the requested ``shmid``.
>>
>> Please clarify that <offset, len> must correspond to the entirety of a
>> valid mapped region.
>>
>> By the way, the VIRTIO 1.3 gives the following behavior for the virtiofs
>> DAX Window:
>>
>>   When a FUSE_SETUPMAPPING request perfectly overlaps a previous
>>   mapping, the previous mapping is replaced. When a mapping partially
>>   overlaps a previous mapping, the previous mapping is split into one or
>>   two smaller mappings. When a mapping is partially unmapped it is also
>>   split into one or two smaller mappings.
>>
>>   Establishing new mappings or splitting existing mappings consumes
>>   resources. If the device runs out of resources the FUSE_SETUPMAPPING
>>   request fails until resources are available again following
>>   FUSE_REMOVEMAPPING.
>>
>> I think SETUPMAPPING/REMOVMAPPING can be implemented using
>> SHMEM_MAP/UNMAP. SHMEM_MAP/UNMAP do not allow atomically replacing
>> partial ranges, but as far as I know that's not necessary for virtiofs
>> in practice.
>>
>> It's worth mentioning that mappings consume resources and that SHMEM_MAP
>> can fail when there are no resources available. The process-wide limit
>> is vm.max_map_count on Linux although a vhost-user frontend may reduce
>> it further to control vhost-user resource usage.
>>
>> > +  A reply is generated indicating whether unmapping succeeded.
>> > +
>> >  .. _reply_ack:
>> >
>> >  VHOST_USER_PROTOCOL_F_REPLY_ACK
>> > diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
>> > index cdf9af4a4b..7ee8a472c6 100644
>> > --- a/hw/virtio/vhost-user.c
>> > +++ b/hw/virtio/vhost-user.c
>> > @@ -115,6 +115,8 @@ typedef enum VhostUserBackendRequest {
>> >      VHOST_USER_BACKEND_SHARED_OBJECT_ADD = 6,
>> >      VHOST_USER_BACKEND_SHARED_OBJECT_REMOVE = 7,
>> >      VHOST_USER_BACKEND_SHARED_OBJECT_LOOKUP = 8,
>> > +    VHOST_USER_BACKEND_SHMEM_MAP = 9,
>> > +    VHOST_USER_BACKEND_SHMEM_UNMAP = 10,
>> >      VHOST_USER_BACKEND_MAX
>> >  }  VhostUserBackendRequest;
>> >
>> > @@ -192,6 +194,24 @@ typedef struct VhostUserShared {
>> >      unsigned char uuid[16];
>> >  } VhostUserShared;
>> >
>> > +/* For the flags field of VhostUserMMap */
>> > +#define VHOST_USER_FLAG_MAP_R (1u << 0)
>> > +#define VHOST_USER_FLAG_MAP_W (1u << 1)
>> > +
>> > +typedef struct {
>> > +    /* VIRTIO Shared Memory Region ID */
>> > +    uint8_t shmid;
>> > +    uint8_t padding[7];
>> > +    /* File offset */
>> > +    uint64_t fd_offset;
>> > +    /* Offset within the VIRTIO Shared Memory Region */
>> > +    uint64_t shm_offset;
>> > +    /* Size of the mapping */
>> > +    uint64_t len;
>> > +    /* Flags for the mmap operation, from VHOST_USER_FLAG_* */
>> > +    uint64_t flags;
>> > +} VhostUserMMap;
>> > +
>> >  typedef struct {
>> >      VhostUserRequest request;
>> >
>> > @@ -224,6 +244,7 @@ typedef union {
>> >          VhostUserInflight inflight;
>> >          VhostUserShared object;
>> >          VhostUserTransferDeviceState transfer_state;
>> > +        VhostUserMMap mmap;
>> >  } VhostUserPayload;
>> >
>> >  typedef struct VhostUserMsg {
>> > @@ -1748,6 +1769,100 @@
>> vhost_user_backend_handle_shared_object_lookup(struct vhost_user *u,
>> >      return 0;
>> >  }
>> >
>> > +static int
>> > +vhost_user_backend_handle_shmem_map(struct vhost_dev *dev,
>> > +                                    VhostUserMMap *vu_mmap,
>> > +                                    int fd)
>> > +{
>> > +    void *addr = 0;
>> > +    MemoryRegion *mr = NULL;
>> > +
>> > +    if (fd < 0) {
>> > +        error_report("Bad fd for map");
>> > +        return -EBADF;
>> > +    }
>> > +
>> > +    if (!dev->vdev->shmem_list ||
>> > +        dev->vdev->n_shmem_regions <= vu_mmap->shmid) {
>> > +        error_report("Device only has %d VIRTIO Shared Memory Regions.
>> "
>> > +                     "Requested ID: %d",
>> > +                     dev->vdev->n_shmem_regions, vu_mmap->shmid);
>> > +        return -EFAULT;
>> > +    }
>> > +
>> > +    mr = &dev->vdev->shmem_list[vu_mmap->shmid];
>> > +
>> > +    if (!mr) {
>> > +        error_report("VIRTIO Shared Memory Region at "
>> > +                     "ID %d unitialized", vu_mmap->shmid);
>> > +        return -EFAULT;
>> > +    }
>> > +
>> > +    if ((vu_mmap->shm_offset + vu_mmap->len) < vu_mmap->len ||
>> > +        (vu_mmap->shm_offset + vu_mmap->len) > mr->size) {
>> > +        error_report("Bad offset/len for mmap %" PRIx64 "+%" PRIx64,
>> > +                     vu_mmap->shm_offset, vu_mmap->len);
>> > +        return -EFAULT;
>> > +    }
>> > +
>> > +    void *shmem_ptr = memory_region_get_ram_ptr(mr);
>> > +
>> > +    addr = mmap(shmem_ptr + vu_mmap->shm_offset, vu_mmap->len,
>>
>> Missing check for overlap between range [shm_offset, shm_offset + len)
>> and existing mappings.
>>
>
> Not sure how to do this check. Specifically, I am not sure how previous
> ranges are stored within the MemoryRegion. Is looping through
> mr->subregions
> a valid option?
>

Maybe something like this would do?
```
     if (memory_region_find(mr, vu_mmap->shm_offset, vu_mmap->len).mr) {
        error_report("Requested memory (%" PRIx64 "+%" PRIx64 " overalps "
                     "with previously mapped memory",
                     vu_mmap->shm_offset, vu_mmap->len);
        return -EFAULT;
    }
```

>
>
>>
>> > +        ((vu_mmap->flags & VHOST_USER_FLAG_MAP_R) ? PROT_READ : 0) |
>> > +        ((vu_mmap->flags & VHOST_USER_FLAG_MAP_W) ? PROT_WRITE : 0),
>> > +        MAP_SHARED | MAP_FIXED, fd, vu_mmap->fd_offset);
>> > +
>> > +    if (addr == MAP_FAILED) {
>> > +        error_report("Failed to mmap mem fd");
>> > +        return -EFAULT;
>> > +    }
>> > +
>> > +    return 0;
>> > +}
>> > +
>> > +static int
>> > +vhost_user_backend_handle_shmem_unmap(struct vhost_dev *dev,
>> > +                                      VhostUserMMap *vu_mmap)
>> > +{
>> > +    void *addr = 0;
>> > +    MemoryRegion *mr = NULL;
>> > +
>> > +    if (!dev->vdev->shmem_list ||
>> > +        dev->vdev->n_shmem_regions <= vu_mmap->shmid) {
>> > +        error_report("Device only has %d VIRTIO Shared Memory Regions.
>> "
>> > +                     "Requested ID: %d",
>> > +                     dev->vdev->n_shmem_regions, vu_mmap->shmid);
>> > +        return -EFAULT;
>> > +    }
>> > +
>> > +    mr = &dev->vdev->shmem_list[vu_mmap->shmid];
>> > +
>> > +    if (!mr) {
>> > +        error_report("VIRTIO Shared Memory Region at "
>> > +                     "ID %d unitialized", vu_mmap->shmid);
>> > +        return -EFAULT;
>> > +    }
>> > +
>> > +    if ((vu_mmap->shm_offset + vu_mmap->len) < vu_mmap->len ||
>> > +        (vu_mmap->shm_offset + vu_mmap->len) > mr->size) {
>> > +        error_report("Bad offset/len for mmap %" PRIx64 "+%" PRIx64,
>> > +                     vu_mmap->shm_offset, vu_mmap->len);
>> > +        return -EFAULT;
>> > +    }
>> > +
>> > +    void *shmem_ptr = memory_region_get_ram_ptr(mr);
>> > +
>> > +    addr = mmap(shmem_ptr + vu_mmap->shm_offset, vu_mmap->len,
>>
>> Missing check for existing mapping with exact range [shm_offset, len)
>> match.
>>
>> > +                PROT_NONE, MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED,
>> -1, 0);
>> > +
>> > +    if (addr == MAP_FAILED) {
>> > +        error_report("Failed to unmap memory");
>> > +        return -EFAULT;
>> > +    }
>> > +
>> > +    return 0;
>> > +}
>> > +
>> >  static void close_backend_channel(struct vhost_user *u)
>> >  {
>> >      g_source_destroy(u->backend_src);
>> > @@ -1816,6 +1931,13 @@ static gboolean backend_read(QIOChannel *ioc,
>> GIOCondition condition,
>> >          ret =
>> vhost_user_backend_handle_shared_object_lookup(dev->opaque, ioc,
>> >                                                               &hdr,
>> &payload);
>> >          break;
>> > +    case VHOST_USER_BACKEND_SHMEM_MAP:
>> > +        ret = vhost_user_backend_handle_shmem_map(dev, &payload.mmap,
>> > +                                                  fd ? fd[0] : -1);
>> > +        break;
>> > +    case VHOST_USER_BACKEND_SHMEM_UNMAP:
>> > +        ret = vhost_user_backend_handle_shmem_unmap(dev,
>> &payload.mmap);
>> > +        break;
>> >      default:
>> >          error_report("Received unexpected msg type: %d.", hdr.request);
>> >          ret = -EINVAL;
>> > diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
>> > index 893a072c9d..9f2da5b11e 100644
>> > --- a/hw/virtio/virtio.c
>> > +++ b/hw/virtio/virtio.c
>> > @@ -2856,6 +2856,16 @@ int virtio_save(VirtIODevice *vdev, QEMUFile *f)
>> >      return vmstate_save_state(f, &vmstate_virtio, vdev, NULL);
>> >  }
>> >
>> > +MemoryRegion *virtio_new_shmem_region(VirtIODevice *vdev)
>> > +{
>> > +    MemoryRegion *mr = g_new0(MemoryRegion, 1);
>> > +    ++vdev->n_shmem_regions;
>> > +    vdev->shmem_list = g_renew(MemoryRegion, vdev->shmem_list,
>> > +                               vdev->n_shmem_regions);
>>
>> Where is shmem_list freed?
>>
>> The name "list" is misleading since this is an array, not a list.
>>
>> > +    vdev->shmem_list[vdev->n_shmem_regions - 1] = *mr;
>> > +    return mr;
>> > +}
>>
>> This looks weird. The contents of mr are copied into shmem_list[] and
>> then the pointer to mr is returned? Did you mean for the field's type to
>> be MemoryRegion **shmem_list and then vdev->shmem_list[...] = mr would
>> stash the pointer?
>>
>> > +
>> >  /* A wrapper for use as a VMState .put function */
>> >  static int virtio_device_put(QEMUFile *f, void *opaque, size_t size,
>> >                                const VMStateField *field, JSONWriter
>> *vmdesc)
>> > @@ -3264,6 +3274,8 @@ void virtio_init(VirtIODevice *vdev, uint16_t
>> device_id, size_t config_size)
>> >              virtio_vmstate_change, vdev);
>> >      vdev->device_endian = virtio_default_endian();
>> >      vdev->use_guest_notifier_mask = true;
>> > +    vdev->shmem_list = NULL;
>> > +    vdev->n_shmem_regions = 0;
>> >  }
>> >
>> >  /*
>> > diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
>> > index 7d5ffdc145..16d598aadc 100644
>> > --- a/include/hw/virtio/virtio.h
>> > +++ b/include/hw/virtio/virtio.h
>> > @@ -165,6 +165,9 @@ struct VirtIODevice
>> >       */
>> >      EventNotifier config_notifier;
>> >      bool device_iotlb_enabled;
>> > +    /* Shared memory region for vhost-user mappings. */
>> > +    MemoryRegion *shmem_list;
>> > +    int n_shmem_regions;
>> >  };
>> >
>> >  struct VirtioDeviceClass {
>> > @@ -280,6 +283,8 @@ void virtio_notify(VirtIODevice *vdev, VirtQueue
>> *vq);
>> >
>> >  int virtio_save(VirtIODevice *vdev, QEMUFile *f);
>> >
>> > +MemoryRegion *virtio_new_shmem_region(VirtIODevice *vdev);
>> > +
>> >  extern const VMStateInfo virtio_vmstate_info;
>> >
>> >  #define VMSTATE_VIRTIO_DEVICE \
>> > diff --git a/subprojects/libvhost-user/libvhost-user.c
>> b/subprojects/libvhost-user/libvhost-user.c
>> > index a879149fef..28556d183a 100644
>> > --- a/subprojects/libvhost-user/libvhost-user.c
>> > +++ b/subprojects/libvhost-user/libvhost-user.c
>> > @@ -1586,6 +1586,71 @@ vu_rm_shared_object(VuDev *dev, unsigned char
>> uuid[UUID_LEN])
>> >      return vu_send_message(dev, &msg);
>> >  }
>> >
>> > +bool
>> > +vu_shmem_map(VuDev *dev, uint8_t shmid, uint64_t fd_offset,
>> > +             uint64_t shm_offset, uint64_t len, uint64_t flags)
>> > +{
>> > +    bool result = false;
>> > +    VhostUserMsg msg_reply;
>> > +    VhostUserMsg vmsg = {
>> > +        .request = VHOST_USER_BACKEND_SHMEM_MAP,
>> > +        .size = sizeof(vmsg.payload.mmap),
>> > +        .flags = VHOST_USER_VERSION,
>> > +        .payload.mmap = {
>> > +            .shmid = shmid,
>> > +            .fd_offset = fd_offset,
>> > +            .shm_offset = shm_offset,
>> > +            .len = len,
>> > +            .flags = flags,
>> > +        },
>> > +    };
>> > +
>> > +    if (vu_has_protocol_feature(dev, VHOST_USER_PROTOCOL_F_REPLY_ACK))
>> {
>> > +        vmsg.flags |= VHOST_USER_NEED_REPLY_MASK;
>> > +    }
>> > +
>> > +    pthread_mutex_lock(&dev->backend_mutex);
>> > +    if (!vu_message_write(dev, dev->backend_fd, &vmsg)) {
>> > +        pthread_mutex_unlock(&dev->backend_mutex);
>> > +        return false;
>> > +    }
>> > +
>> > +    /* Also unlocks the backend_mutex */
>> > +    return vu_process_message_reply(dev, &vmsg);
>> > +}
>> > +
>> > +bool
>> > +vu_shmem_unmap(VuDev *dev, uint8_t shmid, uint64_t fd_offset,
>> > +               uint64_t shm_offset, uint64_t len)
>> > +{
>> > +    bool result = false;
>> > +    VhostUserMsg msg_reply;
>> > +    VhostUserMsg vmsg = {
>> > +        .request = VHOST_USER_BACKEND_SHMEM_UNMAP,
>> > +        .size = sizeof(vmsg.payload.mmap),
>> > +        .flags = VHOST_USER_VERSION,
>> > +        .payload.mmap = {
>> > +            .shmid = shmid,
>> > +            .fd_offset = fd_offset,
>>
>> What is the meaning of this field? I expected it to be set to 0.
>>
>> > +            .shm_offset = shm_offset,
>> > +            .len = len,
>> > +        },
>> > +    };
>> > +
>> > +    if (vu_has_protocol_feature(dev, VHOST_USER_PROTOCOL_F_REPLY_ACK))
>> {
>> > +        vmsg.flags |= VHOST_USER_NEED_REPLY_MASK;
>> > +    }
>> > +
>> > +    pthread_mutex_lock(&dev->backend_mutex);
>> > +    if (!vu_message_write(dev, dev->backend_fd, &vmsg)) {
>> > +        pthread_mutex_unlock(&dev->backend_mutex);
>> > +        return false;
>> > +    }
>> > +
>> > +    /* Also unlocks the backend_mutex */
>> > +    return vu_process_message_reply(dev, &vmsg);
>> > +}
>> > +
>> >  static bool
>> >  vu_set_vring_call_exec(VuDev *dev, VhostUserMsg *vmsg)
>> >  {
>> > diff --git a/subprojects/libvhost-user/libvhost-user.h
>> b/subprojects/libvhost-user/libvhost-user.h
>> > index deb40e77b3..7f6c22cc1a 100644
>> > --- a/subprojects/libvhost-user/libvhost-user.h
>> > +++ b/subprojects/libvhost-user/libvhost-user.h
>> > @@ -127,6 +127,8 @@ typedef enum VhostUserBackendRequest {
>> >      VHOST_USER_BACKEND_SHARED_OBJECT_ADD = 6,
>> >      VHOST_USER_BACKEND_SHARED_OBJECT_REMOVE = 7,
>> >      VHOST_USER_BACKEND_SHARED_OBJECT_LOOKUP = 8,
>> > +    VHOST_USER_BACKEND_SHMEM_MAP = 9,
>> > +    VHOST_USER_BACKEND_SHMEM_UNMAP = 10,
>> >      VHOST_USER_BACKEND_MAX
>> >  }  VhostUserBackendRequest;
>> >
>> > @@ -186,6 +188,24 @@ typedef struct VhostUserShared {
>> >      unsigned char uuid[UUID_LEN];
>> >  } VhostUserShared;
>> >
>> > +/* For the flags field of VhostUserMMap */
>> > +#define VHOST_USER_FLAG_MAP_R (1u << 0)
>> > +#define VHOST_USER_FLAG_MAP_W (1u << 1)
>> > +
>> > +typedef struct {
>> > +    /* VIRTIO Shared Memory Region ID */
>> > +    uint8_t shmid;
>> > +    uint8_t padding[7];
>> > +    /* File offset */
>> > +    uint64_t fd_offset;
>> > +    /* Offset within the VIRTIO Shared Memory Region */
>> > +    uint64_t shm_offset;
>> > +    /* Size of the mapping */
>> > +    uint64_t len;
>> > +    /* Flags for the mmap operation, from VHOST_USER_FLAG_* */
>> > +    uint64_t flags;
>> > +} VhostUserMMap;
>> > +
>> >  #if defined(_WIN32) && (defined(__x86_64__) || defined(__i386__))
>> >  # define VU_PACKED __attribute__((gcc_struct, packed))
>> >  #else
>> > @@ -214,6 +234,7 @@ typedef struct VhostUserMsg {
>> >          VhostUserVringArea area;
>> >          VhostUserInflight inflight;
>> >          VhostUserShared object;
>> > +        VhostUserMMap mmap;
>> >      } payload;
>> >
>> >      int fds[VHOST_MEMORY_BASELINE_NREGIONS];
>> > @@ -597,6 +618,38 @@ bool vu_add_shared_object(VuDev *dev, unsigned
>> char uuid[UUID_LEN]);
>> >   */
>> >  bool vu_rm_shared_object(VuDev *dev, unsigned char uuid[UUID_LEN]);
>> >
>> > +/**
>> > + * vu_shmem_map:
>> > + * @dev: a VuDev context
>> > + * @shmid: VIRTIO Shared Memory Region ID
>> > + * @fd_offset: File offset
>> > + * @shm_offset: Offset within the VIRTIO Shared Memory Region
>> > + * @len: Size of the mapping
>> > + * @flags: Flags for the mmap operation
>> > + *
>> > + * Advertises a new mapping to be made in a given VIRTIO Shared Memory
>> Region.
>> > + *
>> > + * Returns: TRUE on success, FALSE on failure.
>> > + */
>> > +bool vu_shmem_map(VuDev *dev, uint8_t shmid, uint64_t fd_offset,
>> > +                  uint64_t shm_offset, uint64_t len, uint64_t flags);
>> > +
>> > +/**
>> > + * vu_shmem_map:
>> > + * @dev: a VuDev context
>> > + * @shmid: VIRTIO Shared Memory Region ID
>> > + * @fd_offset: File offset
>> > + * @shm_offset: Offset within the VIRTIO Shared Memory Region
>> > + * @len: Size of the mapping
>> > + *
>> > + * The front-end un-mmaps a given range in the VIRTIO Shared Memory
>> Region
>> > + * with the requested `shmid`.
>> > + *
>> > + * Returns: TRUE on success, FALSE on failure.
>> > + */
>> > +bool vu_shmem_unmap(VuDev *dev, uint8_t shmid, uint64_t fd_offset,
>> > +                  uint64_t shm_offset, uint64_t len);
>> > +
>> >  /**
>> >   * vu_queue_set_notification:
>> >   * @dev: a VuDev context
>> > --
>> > 2.45.2
>> >
>>
>
Albert Esteve Sept. 4, 2024, 7:28 a.m. UTC | #4
On Thu, Jul 11, 2024 at 9:45 AM Stefan Hajnoczi <stefanha@redhat.com> wrote:

> On Fri, Jun 28, 2024 at 04:57:06PM +0200, Albert Esteve wrote:
> > Add SHMEM_MAP/UNMAP requests to vhost-user to
> > handle VIRTIO Shared Memory mappings.
> >
> > This request allows backends to dynamically map
> > fds into a VIRTIO Shared Memory Region indentified
> > by its `shmid`. Then, the fd memory is advertised
> > to the driver as a base addres + offset, so it
> > can be read/written (depending on the mmap flags
> > requested) while its valid.
> >
> > The backend can munmap the memory range
> > in a given VIRTIO Shared Memory Region (again,
> > identified by its `shmid`), to free it. Upon
> > receiving this message, the front-end must
> > mmap the regions with PROT_NONE to reserve
> > the virtual memory space.
> >
> > The device model needs to create MemoryRegion
> > instances for the VIRTIO Shared Memory Regions
> > and add them to the `VirtIODevice` instance.
> >
> > Signed-off-by: Albert Esteve <aesteve@redhat.com>
> > ---
> >  docs/interop/vhost-user.rst               |  27 +++++
> >  hw/virtio/vhost-user.c                    | 122 ++++++++++++++++++++++
> >  hw/virtio/virtio.c                        |  12 +++
> >  include/hw/virtio/virtio.h                |   5 +
> >  subprojects/libvhost-user/libvhost-user.c |  65 ++++++++++++
> >  subprojects/libvhost-user/libvhost-user.h |  53 ++++++++++
> >  6 files changed, 284 insertions(+)
> >
> > diff --git a/docs/interop/vhost-user.rst b/docs/interop/vhost-user.rst
> > index d8419fd2f1..d52ba719d5 100644
> > --- a/docs/interop/vhost-user.rst
> > +++ b/docs/interop/vhost-user.rst
> > @@ -1859,6 +1859,33 @@ is sent by the front-end.
> >    when the operation is successful, or non-zero otherwise. Note that if
> the
> >    operation fails, no fd is sent to the backend.
> >
> > +``VHOST_USER_BACKEND_SHMEM_MAP``
> > +  :id: 9
> > +  :equivalent ioctl: N/A
> > +  :request payload: fd and ``struct VhostUserMMap``
> > +  :reply payload: N/A
> > +
> > +  This message can be submitted by the backends to advertise a new
> mapping
> > +  to be made in a given VIRTIO Shared Memory Region. Upon receiving the
> message,
> > +  The front-end will mmap the given fd into the VIRTIO Shared Memory
> Region
> > +  with the requested ``shmid``. A reply is generated indicating whether
> mapping
> > +  succeeded.
> > +
> > +  Mapping over an already existing map is not allowed and request shall
> fail.
> > +  Therefore, the memory range in the request must correspond with a
> valid,
> > +  free region of the VIRTIO Shared Memory Region.
> > +
> > +``VHOST_USER_BACKEND_SHMEM_UNMAP``
> > +  :id: 10
> > +  :equivalent ioctl: N/A
> > +  :request payload: ``struct VhostUserMMap``
> > +  :reply payload: N/A
> > +
> > +  This message can be submitted by the backends so that the front-end
> un-mmap
> > +  a given range (``offset``, ``len``) in the VIRTIO Shared Memory
> Region with
>
> s/offset/shm_offset/
>
> > +  the requested ``shmid``.
>
> Please clarify that <offset, len> must correspond to the entirety of a
> valid mapped region.
>
> By the way, the VIRTIO 1.3 gives the following behavior for the virtiofs
> DAX Window:
>
>   When a FUSE_SETUPMAPPING request perfectly overlaps a previous
>   mapping, the previous mapping is replaced. When a mapping partially
>   overlaps a previous mapping, the previous mapping is split into one or
>   two smaller mappings. When a mapping is partially unmapped it is also
>   split into one or two smaller mappings.
>
>   Establishing new mappings or splitting existing mappings consumes
>   resources. If the device runs out of resources the FUSE_SETUPMAPPING
>   request fails until resources are available again following
>   FUSE_REMOVEMAPPING.
>
> I think SETUPMAPPING/REMOVMAPPING can be implemented using
> SHMEM_MAP/UNMAP. SHMEM_MAP/UNMAP do not allow atomically replacing
> partial ranges, but as far as I know that's not necessary for virtiofs
> in practice.
>
> It's worth mentioning that mappings consume resources and that SHMEM_MAP
> can fail when there are no resources available. The process-wide limit
> is vm.max_map_count on Linux although a vhost-user frontend may reduce
> it further to control vhost-user resource usage.
>
> > +  A reply is generated indicating whether unmapping succeeded.
> > +
> >  .. _reply_ack:
> >
> >  VHOST_USER_PROTOCOL_F_REPLY_ACK
> > diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> > index cdf9af4a4b..7ee8a472c6 100644
> > --- a/hw/virtio/vhost-user.c
> > +++ b/hw/virtio/vhost-user.c
> > @@ -115,6 +115,8 @@ typedef enum VhostUserBackendRequest {
> >      VHOST_USER_BACKEND_SHARED_OBJECT_ADD = 6,
> >      VHOST_USER_BACKEND_SHARED_OBJECT_REMOVE = 7,
> >      VHOST_USER_BACKEND_SHARED_OBJECT_LOOKUP = 8,
> > +    VHOST_USER_BACKEND_SHMEM_MAP = 9,
> > +    VHOST_USER_BACKEND_SHMEM_UNMAP = 10,
> >      VHOST_USER_BACKEND_MAX
> >  }  VhostUserBackendRequest;
> >
> > @@ -192,6 +194,24 @@ typedef struct VhostUserShared {
> >      unsigned char uuid[16];
> >  } VhostUserShared;
> >
> > +/* For the flags field of VhostUserMMap */
> > +#define VHOST_USER_FLAG_MAP_R (1u << 0)
> > +#define VHOST_USER_FLAG_MAP_W (1u << 1)
> > +
> > +typedef struct {
> > +    /* VIRTIO Shared Memory Region ID */
> > +    uint8_t shmid;
> > +    uint8_t padding[7];
> > +    /* File offset */
> > +    uint64_t fd_offset;
> > +    /* Offset within the VIRTIO Shared Memory Region */
> > +    uint64_t shm_offset;
> > +    /* Size of the mapping */
> > +    uint64_t len;
> > +    /* Flags for the mmap operation, from VHOST_USER_FLAG_* */
> > +    uint64_t flags;
> > +} VhostUserMMap;
> > +
> >  typedef struct {
> >      VhostUserRequest request;
> >
> > @@ -224,6 +244,7 @@ typedef union {
> >          VhostUserInflight inflight;
> >          VhostUserShared object;
> >          VhostUserTransferDeviceState transfer_state;
> > +        VhostUserMMap mmap;
> >  } VhostUserPayload;
> >
> >  typedef struct VhostUserMsg {
> > @@ -1748,6 +1769,100 @@
> vhost_user_backend_handle_shared_object_lookup(struct vhost_user *u,
> >      return 0;
> >  }
> >
> > +static int
> > +vhost_user_backend_handle_shmem_map(struct vhost_dev *dev,
> > +                                    VhostUserMMap *vu_mmap,
> > +                                    int fd)
> > +{
> > +    void *addr = 0;
> > +    MemoryRegion *mr = NULL;
> > +
> > +    if (fd < 0) {
> > +        error_report("Bad fd for map");
> > +        return -EBADF;
> > +    }
> > +
> > +    if (!dev->vdev->shmem_list ||
> > +        dev->vdev->n_shmem_regions <= vu_mmap->shmid) {
> > +        error_report("Device only has %d VIRTIO Shared Memory Regions. "
> > +                     "Requested ID: %d",
> > +                     dev->vdev->n_shmem_regions, vu_mmap->shmid);
> > +        return -EFAULT;
> > +    }
> > +
> > +    mr = &dev->vdev->shmem_list[vu_mmap->shmid];
> > +
> > +    if (!mr) {
> > +        error_report("VIRTIO Shared Memory Region at "
> > +                     "ID %d unitialized", vu_mmap->shmid);
> > +        return -EFAULT;
> > +    }
> > +
> > +    if ((vu_mmap->shm_offset + vu_mmap->len) < vu_mmap->len ||
> > +        (vu_mmap->shm_offset + vu_mmap->len) > mr->size) {
> > +        error_report("Bad offset/len for mmap %" PRIx64 "+%" PRIx64,
> > +                     vu_mmap->shm_offset, vu_mmap->len);
> > +        return -EFAULT;
> > +    }
> > +
> > +    void *shmem_ptr = memory_region_get_ram_ptr(mr);
> > +
> > +    addr = mmap(shmem_ptr + vu_mmap->shm_offset, vu_mmap->len,
>
> Missing check for overlap between range [shm_offset, shm_offset + len)
> and existing mappings.
>
> > +        ((vu_mmap->flags & VHOST_USER_FLAG_MAP_R) ? PROT_READ : 0) |
> > +        ((vu_mmap->flags & VHOST_USER_FLAG_MAP_W) ? PROT_WRITE : 0),
> > +        MAP_SHARED | MAP_FIXED, fd, vu_mmap->fd_offset);
> > +
> > +    if (addr == MAP_FAILED) {
> > +        error_report("Failed to mmap mem fd");
> > +        return -EFAULT;
> > +    }
> > +
> > +    return 0;
> > +}
> > +
> > +static int
> > +vhost_user_backend_handle_shmem_unmap(struct vhost_dev *dev,
> > +                                      VhostUserMMap *vu_mmap)
> > +{
> > +    void *addr = 0;
> > +    MemoryRegion *mr = NULL;
> > +
> > +    if (!dev->vdev->shmem_list ||
> > +        dev->vdev->n_shmem_regions <= vu_mmap->shmid) {
> > +        error_report("Device only has %d VIRTIO Shared Memory Regions. "
> > +                     "Requested ID: %d",
> > +                     dev->vdev->n_shmem_regions, vu_mmap->shmid);
> > +        return -EFAULT;
> > +    }
> > +
> > +    mr = &dev->vdev->shmem_list[vu_mmap->shmid];
> > +
> > +    if (!mr) {
> > +        error_report("VIRTIO Shared Memory Region at "
> > +                     "ID %d unitialized", vu_mmap->shmid);
> > +        return -EFAULT;
> > +    }
> > +
> > +    if ((vu_mmap->shm_offset + vu_mmap->len) < vu_mmap->len ||
> > +        (vu_mmap->shm_offset + vu_mmap->len) > mr->size) {
> > +        error_report("Bad offset/len for mmap %" PRIx64 "+%" PRIx64,
> > +                     vu_mmap->shm_offset, vu_mmap->len);
> > +        return -EFAULT;
> > +    }
> > +
> > +    void *shmem_ptr = memory_region_get_ram_ptr(mr);
> > +
> > +    addr = mmap(shmem_ptr + vu_mmap->shm_offset, vu_mmap->len,
>
> Missing check for existing mapping with exact range [shm_offset, len)
> match.
>
> > +                PROT_NONE, MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, -1,
> 0);
> > +
> > +    if (addr == MAP_FAILED) {
> > +        error_report("Failed to unmap memory");
> > +        return -EFAULT;
> > +    }
> > +
> > +    return 0;
> > +}
> > +
> >  static void close_backend_channel(struct vhost_user *u)
> >  {
> >      g_source_destroy(u->backend_src);
> > @@ -1816,6 +1931,13 @@ static gboolean backend_read(QIOChannel *ioc,
> GIOCondition condition,
> >          ret =
> vhost_user_backend_handle_shared_object_lookup(dev->opaque, ioc,
> >                                                               &hdr,
> &payload);
> >          break;
> > +    case VHOST_USER_BACKEND_SHMEM_MAP:
> > +        ret = vhost_user_backend_handle_shmem_map(dev, &payload.mmap,
> > +                                                  fd ? fd[0] : -1);
> > +        break;
> > +    case VHOST_USER_BACKEND_SHMEM_UNMAP:
> > +        ret = vhost_user_backend_handle_shmem_unmap(dev, &payload.mmap);
> > +        break;
> >      default:
> >          error_report("Received unexpected msg type: %d.", hdr.request);
> >          ret = -EINVAL;
> > diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> > index 893a072c9d..9f2da5b11e 100644
> > --- a/hw/virtio/virtio.c
> > +++ b/hw/virtio/virtio.c
> > @@ -2856,6 +2856,16 @@ int virtio_save(VirtIODevice *vdev, QEMUFile *f)
> >      return vmstate_save_state(f, &vmstate_virtio, vdev, NULL);
> >  }
> >
> > +MemoryRegion *virtio_new_shmem_region(VirtIODevice *vdev)
> > +{
> > +    MemoryRegion *mr = g_new0(MemoryRegion, 1);
> > +    ++vdev->n_shmem_regions;
> > +    vdev->shmem_list = g_renew(MemoryRegion, vdev->shmem_list,
> > +                               vdev->n_shmem_regions);
>
> Where is shmem_list freed?
>
> The name "list" is misleading since this is an array, not a list.
>
> > +    vdev->shmem_list[vdev->n_shmem_regions - 1] = *mr;
> > +    return mr;
> > +}
>
> This looks weird. The contents of mr are copied into shmem_list[] and
> then the pointer to mr is returned? Did you mean for the field's type to
> be MemoryRegion **shmem_list and then vdev->shmem_list[...] = mr would
> stash the pointer?
>
> > +
> >  /* A wrapper for use as a VMState .put function */
> >  static int virtio_device_put(QEMUFile *f, void *opaque, size_t size,
> >                                const VMStateField *field, JSONWriter
> *vmdesc)
> > @@ -3264,6 +3274,8 @@ void virtio_init(VirtIODevice *vdev, uint16_t
> device_id, size_t config_size)
> >              virtio_vmstate_change, vdev);
> >      vdev->device_endian = virtio_default_endian();
> >      vdev->use_guest_notifier_mask = true;
> > +    vdev->shmem_list = NULL;
> > +    vdev->n_shmem_regions = 0;
> >  }
> >
> >  /*
> > diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
> > index 7d5ffdc145..16d598aadc 100644
> > --- a/include/hw/virtio/virtio.h
> > +++ b/include/hw/virtio/virtio.h
> > @@ -165,6 +165,9 @@ struct VirtIODevice
> >       */
> >      EventNotifier config_notifier;
> >      bool device_iotlb_enabled;
> > +    /* Shared memory region for vhost-user mappings. */
> > +    MemoryRegion *shmem_list;
> > +    int n_shmem_regions;
> >  };
> >
> >  struct VirtioDeviceClass {
> > @@ -280,6 +283,8 @@ void virtio_notify(VirtIODevice *vdev, VirtQueue
> *vq);
> >
> >  int virtio_save(VirtIODevice *vdev, QEMUFile *f);
> >
> > +MemoryRegion *virtio_new_shmem_region(VirtIODevice *vdev);
> > +
> >  extern const VMStateInfo virtio_vmstate_info;
> >
> >  #define VMSTATE_VIRTIO_DEVICE \
> > diff --git a/subprojects/libvhost-user/libvhost-user.c
> b/subprojects/libvhost-user/libvhost-user.c
> > index a879149fef..28556d183a 100644
> > --- a/subprojects/libvhost-user/libvhost-user.c
> > +++ b/subprojects/libvhost-user/libvhost-user.c
> > @@ -1586,6 +1586,71 @@ vu_rm_shared_object(VuDev *dev, unsigned char
> uuid[UUID_LEN])
> >      return vu_send_message(dev, &msg);
> >  }
> >
> > +bool
> > +vu_shmem_map(VuDev *dev, uint8_t shmid, uint64_t fd_offset,
> > +             uint64_t shm_offset, uint64_t len, uint64_t flags)
> > +{
> > +    bool result = false;
> > +    VhostUserMsg msg_reply;
> > +    VhostUserMsg vmsg = {
> > +        .request = VHOST_USER_BACKEND_SHMEM_MAP,
> > +        .size = sizeof(vmsg.payload.mmap),
> > +        .flags = VHOST_USER_VERSION,
> > +        .payload.mmap = {
> > +            .shmid = shmid,
> > +            .fd_offset = fd_offset,
> > +            .shm_offset = shm_offset,
> > +            .len = len,
> > +            .flags = flags,
> > +        },
> > +    };
> > +
> > +    if (vu_has_protocol_feature(dev, VHOST_USER_PROTOCOL_F_REPLY_ACK)) {
> > +        vmsg.flags |= VHOST_USER_NEED_REPLY_MASK;
> > +    }
> > +
> > +    pthread_mutex_lock(&dev->backend_mutex);
> > +    if (!vu_message_write(dev, dev->backend_fd, &vmsg)) {
> > +        pthread_mutex_unlock(&dev->backend_mutex);
> > +        return false;
> > +    }
> > +
> > +    /* Also unlocks the backend_mutex */
> > +    return vu_process_message_reply(dev, &vmsg);
> > +}
> > +
> > +bool
> > +vu_shmem_unmap(VuDev *dev, uint8_t shmid, uint64_t fd_offset,
> > +               uint64_t shm_offset, uint64_t len)
> > +{
> > +    bool result = false;
> > +    VhostUserMsg msg_reply;
> > +    VhostUserMsg vmsg = {
> > +        .request = VHOST_USER_BACKEND_SHMEM_UNMAP,
> > +        .size = sizeof(vmsg.payload.mmap),
> > +        .flags = VHOST_USER_VERSION,
> > +        .payload.mmap = {
> > +            .shmid = shmid,
> > +            .fd_offset = fd_offset,
>
> What is the meaning of this field? I expected it to be set to 0.
>

Probably true. I just kept it generic for backends to decide what to set it
to,
without considering the real use.
I will remove the parameter and set it to 0 in the request.


>
> > +            .shm_offset = shm_offset,
> > +            .len = len,
> > +        },
> > +    };
> > +
> > +    if (vu_has_protocol_feature(dev, VHOST_USER_PROTOCOL_F_REPLY_ACK)) {
> > +        vmsg.flags |= VHOST_USER_NEED_REPLY_MASK;
> > +    }
> > +
> > +    pthread_mutex_lock(&dev->backend_mutex);
> > +    if (!vu_message_write(dev, dev->backend_fd, &vmsg)) {
> > +        pthread_mutex_unlock(&dev->backend_mutex);
> > +        return false;
> > +    }
> > +
> > +    /* Also unlocks the backend_mutex */
> > +    return vu_process_message_reply(dev, &vmsg);
> > +}
> > +
> >  static bool
> >  vu_set_vring_call_exec(VuDev *dev, VhostUserMsg *vmsg)
> >  {
> > diff --git a/subprojects/libvhost-user/libvhost-user.h
> b/subprojects/libvhost-user/libvhost-user.h
> > index deb40e77b3..7f6c22cc1a 100644
> > --- a/subprojects/libvhost-user/libvhost-user.h
> > +++ b/subprojects/libvhost-user/libvhost-user.h
> > @@ -127,6 +127,8 @@ typedef enum VhostUserBackendRequest {
> >      VHOST_USER_BACKEND_SHARED_OBJECT_ADD = 6,
> >      VHOST_USER_BACKEND_SHARED_OBJECT_REMOVE = 7,
> >      VHOST_USER_BACKEND_SHARED_OBJECT_LOOKUP = 8,
> > +    VHOST_USER_BACKEND_SHMEM_MAP = 9,
> > +    VHOST_USER_BACKEND_SHMEM_UNMAP = 10,
> >      VHOST_USER_BACKEND_MAX
> >  }  VhostUserBackendRequest;
> >
> > @@ -186,6 +188,24 @@ typedef struct VhostUserShared {
> >      unsigned char uuid[UUID_LEN];
> >  } VhostUserShared;
> >
> > +/* For the flags field of VhostUserMMap */
> > +#define VHOST_USER_FLAG_MAP_R (1u << 0)
> > +#define VHOST_USER_FLAG_MAP_W (1u << 1)
> > +
> > +typedef struct {
> > +    /* VIRTIO Shared Memory Region ID */
> > +    uint8_t shmid;
> > +    uint8_t padding[7];
> > +    /* File offset */
> > +    uint64_t fd_offset;
> > +    /* Offset within the VIRTIO Shared Memory Region */
> > +    uint64_t shm_offset;
> > +    /* Size of the mapping */
> > +    uint64_t len;
> > +    /* Flags for the mmap operation, from VHOST_USER_FLAG_* */
> > +    uint64_t flags;
> > +} VhostUserMMap;
> > +
> >  #if defined(_WIN32) && (defined(__x86_64__) || defined(__i386__))
> >  # define VU_PACKED __attribute__((gcc_struct, packed))
> >  #else
> > @@ -214,6 +234,7 @@ typedef struct VhostUserMsg {
> >          VhostUserVringArea area;
> >          VhostUserInflight inflight;
> >          VhostUserShared object;
> > +        VhostUserMMap mmap;
> >      } payload;
> >
> >      int fds[VHOST_MEMORY_BASELINE_NREGIONS];
> > @@ -597,6 +618,38 @@ bool vu_add_shared_object(VuDev *dev, unsigned char
> uuid[UUID_LEN]);
> >   */
> >  bool vu_rm_shared_object(VuDev *dev, unsigned char uuid[UUID_LEN]);
> >
> > +/**
> > + * vu_shmem_map:
> > + * @dev: a VuDev context
> > + * @shmid: VIRTIO Shared Memory Region ID
> > + * @fd_offset: File offset
> > + * @shm_offset: Offset within the VIRTIO Shared Memory Region
> > + * @len: Size of the mapping
> > + * @flags: Flags for the mmap operation
> > + *
> > + * Advertises a new mapping to be made in a given VIRTIO Shared Memory
> Region.
> > + *
> > + * Returns: TRUE on success, FALSE on failure.
> > + */
> > +bool vu_shmem_map(VuDev *dev, uint8_t shmid, uint64_t fd_offset,
> > +                  uint64_t shm_offset, uint64_t len, uint64_t flags);
> > +
> > +/**
> > + * vu_shmem_map:
> > + * @dev: a VuDev context
> > + * @shmid: VIRTIO Shared Memory Region ID
> > + * @fd_offset: File offset
> > + * @shm_offset: Offset within the VIRTIO Shared Memory Region
> > + * @len: Size of the mapping
> > + *
> > + * The front-end un-mmaps a given range in the VIRTIO Shared Memory
> Region
> > + * with the requested `shmid`.
> > + *
> > + * Returns: TRUE on success, FALSE on failure.
> > + */
> > +bool vu_shmem_unmap(VuDev *dev, uint8_t shmid, uint64_t fd_offset,
> > +                  uint64_t shm_offset, uint64_t len);
> > +
> >  /**
> >   * vu_queue_set_notification:
> >   * @dev: a VuDev context
> > --
> > 2.45.2
> >
>
Stefan Hajnoczi Sept. 5, 2024, 4:45 p.m. UTC | #5
On Tue, Sep 03, 2024 at 01:54:12PM +0200, Albert Esteve wrote:
> On Tue, Sep 3, 2024 at 11:54 AM Albert Esteve <aesteve@redhat.com> wrote:
> 
> >
> >
> > On Thu, Jul 11, 2024 at 9:45 AM Stefan Hajnoczi <stefanha@redhat.com>
> > wrote:
> >
> >> On Fri, Jun 28, 2024 at 04:57:06PM +0200, Albert Esteve wrote:
> >> > Add SHMEM_MAP/UNMAP requests to vhost-user to
> >> > handle VIRTIO Shared Memory mappings.
> >> >
> >> > This request allows backends to dynamically map
> >> > fds into a VIRTIO Shared Memory Region indentified
> >> > by its `shmid`. Then, the fd memory is advertised
> >> > to the driver as a base addres + offset, so it
> >> > can be read/written (depending on the mmap flags
> >> > requested) while its valid.
> >> >
> >> > The backend can munmap the memory range
> >> > in a given VIRTIO Shared Memory Region (again,
> >> > identified by its `shmid`), to free it. Upon
> >> > receiving this message, the front-end must
> >> > mmap the regions with PROT_NONE to reserve
> >> > the virtual memory space.
> >> >
> >> > The device model needs to create MemoryRegion
> >> > instances for the VIRTIO Shared Memory Regions
> >> > and add them to the `VirtIODevice` instance.
> >> >
> >> > Signed-off-by: Albert Esteve <aesteve@redhat.com>
> >> > ---
> >> >  docs/interop/vhost-user.rst               |  27 +++++
> >> >  hw/virtio/vhost-user.c                    | 122 ++++++++++++++++++++++
> >> >  hw/virtio/virtio.c                        |  12 +++
> >> >  include/hw/virtio/virtio.h                |   5 +
> >> >  subprojects/libvhost-user/libvhost-user.c |  65 ++++++++++++
> >> >  subprojects/libvhost-user/libvhost-user.h |  53 ++++++++++
> >> >  6 files changed, 284 insertions(+)
> >> >
> >> > diff --git a/docs/interop/vhost-user.rst b/docs/interop/vhost-user.rst
> >> > index d8419fd2f1..d52ba719d5 100644
> >> > --- a/docs/interop/vhost-user.rst
> >> > +++ b/docs/interop/vhost-user.rst
> >> > @@ -1859,6 +1859,33 @@ is sent by the front-end.
> >> >    when the operation is successful, or non-zero otherwise. Note that
> >> if the
> >> >    operation fails, no fd is sent to the backend.
> >> >
> >> > +``VHOST_USER_BACKEND_SHMEM_MAP``
> >> > +  :id: 9
> >> > +  :equivalent ioctl: N/A
> >> > +  :request payload: fd and ``struct VhostUserMMap``
> >> > +  :reply payload: N/A
> >> > +
> >> > +  This message can be submitted by the backends to advertise a new
> >> mapping
> >> > +  to be made in a given VIRTIO Shared Memory Region. Upon receiving
> >> the message,
> >> > +  The front-end will mmap the given fd into the VIRTIO Shared Memory
> >> Region
> >> > +  with the requested ``shmid``. A reply is generated indicating
> >> whether mapping
> >> > +  succeeded.
> >> > +
> >> > +  Mapping over an already existing map is not allowed and request
> >> shall fail.
> >> > +  Therefore, the memory range in the request must correspond with a
> >> valid,
> >> > +  free region of the VIRTIO Shared Memory Region.
> >> > +
> >> > +``VHOST_USER_BACKEND_SHMEM_UNMAP``
> >> > +  :id: 10
> >> > +  :equivalent ioctl: N/A
> >> > +  :request payload: ``struct VhostUserMMap``
> >> > +  :reply payload: N/A
> >> > +
> >> > +  This message can be submitted by the backends so that the front-end
> >> un-mmap
> >> > +  a given range (``offset``, ``len``) in the VIRTIO Shared Memory
> >> Region with
> >>
> >> s/offset/shm_offset/
> >>
> >> > +  the requested ``shmid``.
> >>
> >> Please clarify that <offset, len> must correspond to the entirety of a
> >> valid mapped region.
> >>
> >> By the way, the VIRTIO 1.3 gives the following behavior for the virtiofs
> >> DAX Window:
> >>
> >>   When a FUSE_SETUPMAPPING request perfectly overlaps a previous
> >>   mapping, the previous mapping is replaced. When a mapping partially
> >>   overlaps a previous mapping, the previous mapping is split into one or
> >>   two smaller mappings. When a mapping is partially unmapped it is also
> >>   split into one or two smaller mappings.
> >>
> >>   Establishing new mappings or splitting existing mappings consumes
> >>   resources. If the device runs out of resources the FUSE_SETUPMAPPING
> >>   request fails until resources are available again following
> >>   FUSE_REMOVEMAPPING.
> >>
> >> I think SETUPMAPPING/REMOVMAPPING can be implemented using
> >> SHMEM_MAP/UNMAP. SHMEM_MAP/UNMAP do not allow atomically replacing
> >> partial ranges, but as far as I know that's not necessary for virtiofs
> >> in practice.
> >>
> >> It's worth mentioning that mappings consume resources and that SHMEM_MAP
> >> can fail when there are no resources available. The process-wide limit
> >> is vm.max_map_count on Linux although a vhost-user frontend may reduce
> >> it further to control vhost-user resource usage.
> >>
> >> > +  A reply is generated indicating whether unmapping succeeded.
> >> > +
> >> >  .. _reply_ack:
> >> >
> >> >  VHOST_USER_PROTOCOL_F_REPLY_ACK
> >> > diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> >> > index cdf9af4a4b..7ee8a472c6 100644
> >> > --- a/hw/virtio/vhost-user.c
> >> > +++ b/hw/virtio/vhost-user.c
> >> > @@ -115,6 +115,8 @@ typedef enum VhostUserBackendRequest {
> >> >      VHOST_USER_BACKEND_SHARED_OBJECT_ADD = 6,
> >> >      VHOST_USER_BACKEND_SHARED_OBJECT_REMOVE = 7,
> >> >      VHOST_USER_BACKEND_SHARED_OBJECT_LOOKUP = 8,
> >> > +    VHOST_USER_BACKEND_SHMEM_MAP = 9,
> >> > +    VHOST_USER_BACKEND_SHMEM_UNMAP = 10,
> >> >      VHOST_USER_BACKEND_MAX
> >> >  }  VhostUserBackendRequest;
> >> >
> >> > @@ -192,6 +194,24 @@ typedef struct VhostUserShared {
> >> >      unsigned char uuid[16];
> >> >  } VhostUserShared;
> >> >
> >> > +/* For the flags field of VhostUserMMap */
> >> > +#define VHOST_USER_FLAG_MAP_R (1u << 0)
> >> > +#define VHOST_USER_FLAG_MAP_W (1u << 1)
> >> > +
> >> > +typedef struct {
> >> > +    /* VIRTIO Shared Memory Region ID */
> >> > +    uint8_t shmid;
> >> > +    uint8_t padding[7];
> >> > +    /* File offset */
> >> > +    uint64_t fd_offset;
> >> > +    /* Offset within the VIRTIO Shared Memory Region */
> >> > +    uint64_t shm_offset;
> >> > +    /* Size of the mapping */
> >> > +    uint64_t len;
> >> > +    /* Flags for the mmap operation, from VHOST_USER_FLAG_* */
> >> > +    uint64_t flags;
> >> > +} VhostUserMMap;
> >> > +
> >> >  typedef struct {
> >> >      VhostUserRequest request;
> >> >
> >> > @@ -224,6 +244,7 @@ typedef union {
> >> >          VhostUserInflight inflight;
> >> >          VhostUserShared object;
> >> >          VhostUserTransferDeviceState transfer_state;
> >> > +        VhostUserMMap mmap;
> >> >  } VhostUserPayload;
> >> >
> >> >  typedef struct VhostUserMsg {
> >> > @@ -1748,6 +1769,100 @@
> >> vhost_user_backend_handle_shared_object_lookup(struct vhost_user *u,
> >> >      return 0;
> >> >  }
> >> >
> >> > +static int
> >> > +vhost_user_backend_handle_shmem_map(struct vhost_dev *dev,
> >> > +                                    VhostUserMMap *vu_mmap,
> >> > +                                    int fd)
> >> > +{
> >> > +    void *addr = 0;
> >> > +    MemoryRegion *mr = NULL;
> >> > +
> >> > +    if (fd < 0) {
> >> > +        error_report("Bad fd for map");
> >> > +        return -EBADF;
> >> > +    }
> >> > +
> >> > +    if (!dev->vdev->shmem_list ||
> >> > +        dev->vdev->n_shmem_regions <= vu_mmap->shmid) {
> >> > +        error_report("Device only has %d VIRTIO Shared Memory Regions.
> >> "
> >> > +                     "Requested ID: %d",
> >> > +                     dev->vdev->n_shmem_regions, vu_mmap->shmid);
> >> > +        return -EFAULT;
> >> > +    }
> >> > +
> >> > +    mr = &dev->vdev->shmem_list[vu_mmap->shmid];
> >> > +
> >> > +    if (!mr) {
> >> > +        error_report("VIRTIO Shared Memory Region at "
> >> > +                     "ID %d unitialized", vu_mmap->shmid);
> >> > +        return -EFAULT;
> >> > +    }
> >> > +
> >> > +    if ((vu_mmap->shm_offset + vu_mmap->len) < vu_mmap->len ||
> >> > +        (vu_mmap->shm_offset + vu_mmap->len) > mr->size) {
> >> > +        error_report("Bad offset/len for mmap %" PRIx64 "+%" PRIx64,
> >> > +                     vu_mmap->shm_offset, vu_mmap->len);
> >> > +        return -EFAULT;
> >> > +    }
> >> > +
> >> > +    void *shmem_ptr = memory_region_get_ram_ptr(mr);
> >> > +
> >> > +    addr = mmap(shmem_ptr + vu_mmap->shm_offset, vu_mmap->len,
> >>
> >> Missing check for overlap between range [shm_offset, shm_offset + len)
> >> and existing mappings.
> >>
> >
> > Not sure how to do this check. Specifically, I am not sure how previous
> > ranges are stored within the MemoryRegion. Is looping through
> > mr->subregions
> > a valid option?
> >
> 
> Maybe something like this would do?
> ```
>      if (memory_region_find(mr, vu_mmap->shm_offset, vu_mmap->len).mr) {
>         error_report("Requested memory (%" PRIx64 "+%" PRIx64 " overalps "
>                      "with previously mapped memory",
>                      vu_mmap->shm_offset, vu_mmap->len);
>         return -EFAULT;
>     }
> ```

I don't think that works because the QEMU MemoryRegion covers the entire
range, some of which contains mappings and some of which is empty. It
would be necessary to track mappings that have been made.

I'm not aware of a security implication if the overlap check is missing,
so I guess it may be okay to skip it and rely on the vhost-user back-end
author to honor the spec. I'm not totally against that because it's
faster and less code, but it feels a bit iffy to not enforce the input
validation that the spec requires.

Maintain a list of mappings so this check can be performed?

> 
> >
> >
> >>
> >> > +        ((vu_mmap->flags & VHOST_USER_FLAG_MAP_R) ? PROT_READ : 0) |
> >> > +        ((vu_mmap->flags & VHOST_USER_FLAG_MAP_W) ? PROT_WRITE : 0),
> >> > +        MAP_SHARED | MAP_FIXED, fd, vu_mmap->fd_offset);
> >> > +
> >> > +    if (addr == MAP_FAILED) {
> >> > +        error_report("Failed to mmap mem fd");
> >> > +        return -EFAULT;
> >> > +    }
> >> > +
> >> > +    return 0;
> >> > +}
> >> > +
> >> > +static int
> >> > +vhost_user_backend_handle_shmem_unmap(struct vhost_dev *dev,
> >> > +                                      VhostUserMMap *vu_mmap)
> >> > +{
> >> > +    void *addr = 0;
> >> > +    MemoryRegion *mr = NULL;
> >> > +
> >> > +    if (!dev->vdev->shmem_list ||
> >> > +        dev->vdev->n_shmem_regions <= vu_mmap->shmid) {
> >> > +        error_report("Device only has %d VIRTIO Shared Memory Regions.
> >> "
> >> > +                     "Requested ID: %d",
> >> > +                     dev->vdev->n_shmem_regions, vu_mmap->shmid);
> >> > +        return -EFAULT;
> >> > +    }
> >> > +
> >> > +    mr = &dev->vdev->shmem_list[vu_mmap->shmid];
> >> > +
> >> > +    if (!mr) {
> >> > +        error_report("VIRTIO Shared Memory Region at "
> >> > +                     "ID %d unitialized", vu_mmap->shmid);
> >> > +        return -EFAULT;
> >> > +    }
> >> > +
> >> > +    if ((vu_mmap->shm_offset + vu_mmap->len) < vu_mmap->len ||
> >> > +        (vu_mmap->shm_offset + vu_mmap->len) > mr->size) {
> >> > +        error_report("Bad offset/len for mmap %" PRIx64 "+%" PRIx64,
> >> > +                     vu_mmap->shm_offset, vu_mmap->len);
> >> > +        return -EFAULT;
> >> > +    }
> >> > +
> >> > +    void *shmem_ptr = memory_region_get_ram_ptr(mr);
> >> > +
> >> > +    addr = mmap(shmem_ptr + vu_mmap->shm_offset, vu_mmap->len,
> >>
> >> Missing check for existing mapping with exact range [shm_offset, len)
> >> match.
> >>
> >> > +                PROT_NONE, MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED,
> >> -1, 0);
> >> > +
> >> > +    if (addr == MAP_FAILED) {
> >> > +        error_report("Failed to unmap memory");
> >> > +        return -EFAULT;
> >> > +    }
> >> > +
> >> > +    return 0;
> >> > +}
> >> > +
> >> >  static void close_backend_channel(struct vhost_user *u)
> >> >  {
> >> >      g_source_destroy(u->backend_src);
> >> > @@ -1816,6 +1931,13 @@ static gboolean backend_read(QIOChannel *ioc,
> >> GIOCondition condition,
> >> >          ret =
> >> vhost_user_backend_handle_shared_object_lookup(dev->opaque, ioc,
> >> >                                                               &hdr,
> >> &payload);
> >> >          break;
> >> > +    case VHOST_USER_BACKEND_SHMEM_MAP:
> >> > +        ret = vhost_user_backend_handle_shmem_map(dev, &payload.mmap,
> >> > +                                                  fd ? fd[0] : -1);
> >> > +        break;
> >> > +    case VHOST_USER_BACKEND_SHMEM_UNMAP:
> >> > +        ret = vhost_user_backend_handle_shmem_unmap(dev,
> >> &payload.mmap);
> >> > +        break;
> >> >      default:
> >> >          error_report("Received unexpected msg type: %d.", hdr.request);
> >> >          ret = -EINVAL;
> >> > diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> >> > index 893a072c9d..9f2da5b11e 100644
> >> > --- a/hw/virtio/virtio.c
> >> > +++ b/hw/virtio/virtio.c
> >> > @@ -2856,6 +2856,16 @@ int virtio_save(VirtIODevice *vdev, QEMUFile *f)
> >> >      return vmstate_save_state(f, &vmstate_virtio, vdev, NULL);
> >> >  }
> >> >
> >> > +MemoryRegion *virtio_new_shmem_region(VirtIODevice *vdev)
> >> > +{
> >> > +    MemoryRegion *mr = g_new0(MemoryRegion, 1);
> >> > +    ++vdev->n_shmem_regions;
> >> > +    vdev->shmem_list = g_renew(MemoryRegion, vdev->shmem_list,
> >> > +                               vdev->n_shmem_regions);
> >>
> >> Where is shmem_list freed?
> >>
> >> The name "list" is misleading since this is an array, not a list.
> >>
> >> > +    vdev->shmem_list[vdev->n_shmem_regions - 1] = *mr;
> >> > +    return mr;
> >> > +}
> >>
> >> This looks weird. The contents of mr are copied into shmem_list[] and
> >> then the pointer to mr is returned? Did you mean for the field's type to
> >> be MemoryRegion **shmem_list and then vdev->shmem_list[...] = mr would
> >> stash the pointer?
> >>
> >> > +
> >> >  /* A wrapper for use as a VMState .put function */
> >> >  static int virtio_device_put(QEMUFile *f, void *opaque, size_t size,
> >> >                                const VMStateField *field, JSONWriter
> >> *vmdesc)
> >> > @@ -3264,6 +3274,8 @@ void virtio_init(VirtIODevice *vdev, uint16_t
> >> device_id, size_t config_size)
> >> >              virtio_vmstate_change, vdev);
> >> >      vdev->device_endian = virtio_default_endian();
> >> >      vdev->use_guest_notifier_mask = true;
> >> > +    vdev->shmem_list = NULL;
> >> > +    vdev->n_shmem_regions = 0;
> >> >  }
> >> >
> >> >  /*
> >> > diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
> >> > index 7d5ffdc145..16d598aadc 100644
> >> > --- a/include/hw/virtio/virtio.h
> >> > +++ b/include/hw/virtio/virtio.h
> >> > @@ -165,6 +165,9 @@ struct VirtIODevice
> >> >       */
> >> >      EventNotifier config_notifier;
> >> >      bool device_iotlb_enabled;
> >> > +    /* Shared memory region for vhost-user mappings. */
> >> > +    MemoryRegion *shmem_list;
> >> > +    int n_shmem_regions;
> >> >  };
> >> >
> >> >  struct VirtioDeviceClass {
> >> > @@ -280,6 +283,8 @@ void virtio_notify(VirtIODevice *vdev, VirtQueue
> >> *vq);
> >> >
> >> >  int virtio_save(VirtIODevice *vdev, QEMUFile *f);
> >> >
> >> > +MemoryRegion *virtio_new_shmem_region(VirtIODevice *vdev);
> >> > +
> >> >  extern const VMStateInfo virtio_vmstate_info;
> >> >
> >> >  #define VMSTATE_VIRTIO_DEVICE \
> >> > diff --git a/subprojects/libvhost-user/libvhost-user.c
> >> b/subprojects/libvhost-user/libvhost-user.c
> >> > index a879149fef..28556d183a 100644
> >> > --- a/subprojects/libvhost-user/libvhost-user.c
> >> > +++ b/subprojects/libvhost-user/libvhost-user.c
> >> > @@ -1586,6 +1586,71 @@ vu_rm_shared_object(VuDev *dev, unsigned char
> >> uuid[UUID_LEN])
> >> >      return vu_send_message(dev, &msg);
> >> >  }
> >> >
> >> > +bool
> >> > +vu_shmem_map(VuDev *dev, uint8_t shmid, uint64_t fd_offset,
> >> > +             uint64_t shm_offset, uint64_t len, uint64_t flags)
> >> > +{
> >> > +    bool result = false;
> >> > +    VhostUserMsg msg_reply;
> >> > +    VhostUserMsg vmsg = {
> >> > +        .request = VHOST_USER_BACKEND_SHMEM_MAP,
> >> > +        .size = sizeof(vmsg.payload.mmap),
> >> > +        .flags = VHOST_USER_VERSION,
> >> > +        .payload.mmap = {
> >> > +            .shmid = shmid,
> >> > +            .fd_offset = fd_offset,
> >> > +            .shm_offset = shm_offset,
> >> > +            .len = len,
> >> > +            .flags = flags,
> >> > +        },
> >> > +    };
> >> > +
> >> > +    if (vu_has_protocol_feature(dev, VHOST_USER_PROTOCOL_F_REPLY_ACK))
> >> {
> >> > +        vmsg.flags |= VHOST_USER_NEED_REPLY_MASK;
> >> > +    }
> >> > +
> >> > +    pthread_mutex_lock(&dev->backend_mutex);
> >> > +    if (!vu_message_write(dev, dev->backend_fd, &vmsg)) {
> >> > +        pthread_mutex_unlock(&dev->backend_mutex);
> >> > +        return false;
> >> > +    }
> >> > +
> >> > +    /* Also unlocks the backend_mutex */
> >> > +    return vu_process_message_reply(dev, &vmsg);
> >> > +}
> >> > +
> >> > +bool
> >> > +vu_shmem_unmap(VuDev *dev, uint8_t shmid, uint64_t fd_offset,
> >> > +               uint64_t shm_offset, uint64_t len)
> >> > +{
> >> > +    bool result = false;
> >> > +    VhostUserMsg msg_reply;
> >> > +    VhostUserMsg vmsg = {
> >> > +        .request = VHOST_USER_BACKEND_SHMEM_UNMAP,
> >> > +        .size = sizeof(vmsg.payload.mmap),
> >> > +        .flags = VHOST_USER_VERSION,
> >> > +        .payload.mmap = {
> >> > +            .shmid = shmid,
> >> > +            .fd_offset = fd_offset,
> >>
> >> What is the meaning of this field? I expected it to be set to 0.
> >>
> >> > +            .shm_offset = shm_offset,
> >> > +            .len = len,
> >> > +        },
> >> > +    };
> >> > +
> >> > +    if (vu_has_protocol_feature(dev, VHOST_USER_PROTOCOL_F_REPLY_ACK))
> >> {
> >> > +        vmsg.flags |= VHOST_USER_NEED_REPLY_MASK;
> >> > +    }
> >> > +
> >> > +    pthread_mutex_lock(&dev->backend_mutex);
> >> > +    if (!vu_message_write(dev, dev->backend_fd, &vmsg)) {
> >> > +        pthread_mutex_unlock(&dev->backend_mutex);
> >> > +        return false;
> >> > +    }
> >> > +
> >> > +    /* Also unlocks the backend_mutex */
> >> > +    return vu_process_message_reply(dev, &vmsg);
> >> > +}
> >> > +
> >> >  static bool
> >> >  vu_set_vring_call_exec(VuDev *dev, VhostUserMsg *vmsg)
> >> >  {
> >> > diff --git a/subprojects/libvhost-user/libvhost-user.h
> >> b/subprojects/libvhost-user/libvhost-user.h
> >> > index deb40e77b3..7f6c22cc1a 100644
> >> > --- a/subprojects/libvhost-user/libvhost-user.h
> >> > +++ b/subprojects/libvhost-user/libvhost-user.h
> >> > @@ -127,6 +127,8 @@ typedef enum VhostUserBackendRequest {
> >> >      VHOST_USER_BACKEND_SHARED_OBJECT_ADD = 6,
> >> >      VHOST_USER_BACKEND_SHARED_OBJECT_REMOVE = 7,
> >> >      VHOST_USER_BACKEND_SHARED_OBJECT_LOOKUP = 8,
> >> > +    VHOST_USER_BACKEND_SHMEM_MAP = 9,
> >> > +    VHOST_USER_BACKEND_SHMEM_UNMAP = 10,
> >> >      VHOST_USER_BACKEND_MAX
> >> >  }  VhostUserBackendRequest;
> >> >
> >> > @@ -186,6 +188,24 @@ typedef struct VhostUserShared {
> >> >      unsigned char uuid[UUID_LEN];
> >> >  } VhostUserShared;
> >> >
> >> > +/* For the flags field of VhostUserMMap */
> >> > +#define VHOST_USER_FLAG_MAP_R (1u << 0)
> >> > +#define VHOST_USER_FLAG_MAP_W (1u << 1)
> >> > +
> >> > +typedef struct {
> >> > +    /* VIRTIO Shared Memory Region ID */
> >> > +    uint8_t shmid;
> >> > +    uint8_t padding[7];
> >> > +    /* File offset */
> >> > +    uint64_t fd_offset;
> >> > +    /* Offset within the VIRTIO Shared Memory Region */
> >> > +    uint64_t shm_offset;
> >> > +    /* Size of the mapping */
> >> > +    uint64_t len;
> >> > +    /* Flags for the mmap operation, from VHOST_USER_FLAG_* */
> >> > +    uint64_t flags;
> >> > +} VhostUserMMap;
> >> > +
> >> >  #if defined(_WIN32) && (defined(__x86_64__) || defined(__i386__))
> >> >  # define VU_PACKED __attribute__((gcc_struct, packed))
> >> >  #else
> >> > @@ -214,6 +234,7 @@ typedef struct VhostUserMsg {
> >> >          VhostUserVringArea area;
> >> >          VhostUserInflight inflight;
> >> >          VhostUserShared object;
> >> > +        VhostUserMMap mmap;
> >> >      } payload;
> >> >
> >> >      int fds[VHOST_MEMORY_BASELINE_NREGIONS];
> >> > @@ -597,6 +618,38 @@ bool vu_add_shared_object(VuDev *dev, unsigned
> >> char uuid[UUID_LEN]);
> >> >   */
> >> >  bool vu_rm_shared_object(VuDev *dev, unsigned char uuid[UUID_LEN]);
> >> >
> >> > +/**
> >> > + * vu_shmem_map:
> >> > + * @dev: a VuDev context
> >> > + * @shmid: VIRTIO Shared Memory Region ID
> >> > + * @fd_offset: File offset
> >> > + * @shm_offset: Offset within the VIRTIO Shared Memory Region
> >> > + * @len: Size of the mapping
> >> > + * @flags: Flags for the mmap operation
> >> > + *
> >> > + * Advertises a new mapping to be made in a given VIRTIO Shared Memory
> >> Region.
> >> > + *
> >> > + * Returns: TRUE on success, FALSE on failure.
> >> > + */
> >> > +bool vu_shmem_map(VuDev *dev, uint8_t shmid, uint64_t fd_offset,
> >> > +                  uint64_t shm_offset, uint64_t len, uint64_t flags);
> >> > +
> >> > +/**
> >> > + * vu_shmem_map:
> >> > + * @dev: a VuDev context
> >> > + * @shmid: VIRTIO Shared Memory Region ID
> >> > + * @fd_offset: File offset
> >> > + * @shm_offset: Offset within the VIRTIO Shared Memory Region
> >> > + * @len: Size of the mapping
> >> > + *
> >> > + * The front-end un-mmaps a given range in the VIRTIO Shared Memory
> >> Region
> >> > + * with the requested `shmid`.
> >> > + *
> >> > + * Returns: TRUE on success, FALSE on failure.
> >> > + */
> >> > +bool vu_shmem_unmap(VuDev *dev, uint8_t shmid, uint64_t fd_offset,
> >> > +                  uint64_t shm_offset, uint64_t len);
> >> > +
> >> >  /**
> >> >   * vu_queue_set_notification:
> >> >   * @dev: a VuDev context
> >> > --
> >> > 2.45.2
> >> >
> >>
> >
Albert Esteve Sept. 11, 2024, 11:57 a.m. UTC | #6
On Thu, Sep 5, 2024 at 6:45 PM Stefan Hajnoczi <stefanha@redhat.com> wrote:

> On Tue, Sep 03, 2024 at 01:54:12PM +0200, Albert Esteve wrote:
> > On Tue, Sep 3, 2024 at 11:54 AM Albert Esteve <aesteve@redhat.com>
> wrote:
> >
> > >
> > >
> > > On Thu, Jul 11, 2024 at 9:45 AM Stefan Hajnoczi <stefanha@redhat.com>
> > > wrote:
> > >
> > >> On Fri, Jun 28, 2024 at 04:57:06PM +0200, Albert Esteve wrote:
> > >> > Add SHMEM_MAP/UNMAP requests to vhost-user to
> > >> > handle VIRTIO Shared Memory mappings.
> > >> >
> > >> > This request allows backends to dynamically map
> > >> > fds into a VIRTIO Shared Memory Region indentified
> > >> > by its `shmid`. Then, the fd memory is advertised
> > >> > to the driver as a base addres + offset, so it
> > >> > can be read/written (depending on the mmap flags
> > >> > requested) while its valid.
> > >> >
> > >> > The backend can munmap the memory range
> > >> > in a given VIRTIO Shared Memory Region (again,
> > >> > identified by its `shmid`), to free it. Upon
> > >> > receiving this message, the front-end must
> > >> > mmap the regions with PROT_NONE to reserve
> > >> > the virtual memory space.
> > >> >
> > >> > The device model needs to create MemoryRegion
> > >> > instances for the VIRTIO Shared Memory Regions
> > >> > and add them to the `VirtIODevice` instance.
> > >> >
> > >> > Signed-off-by: Albert Esteve <aesteve@redhat.com>
> > >> > ---
> > >> >  docs/interop/vhost-user.rst               |  27 +++++
> > >> >  hw/virtio/vhost-user.c                    | 122
> ++++++++++++++++++++++
> > >> >  hw/virtio/virtio.c                        |  12 +++
> > >> >  include/hw/virtio/virtio.h                |   5 +
> > >> >  subprojects/libvhost-user/libvhost-user.c |  65 ++++++++++++
> > >> >  subprojects/libvhost-user/libvhost-user.h |  53 ++++++++++
> > >> >  6 files changed, 284 insertions(+)
> > >> >
> > >> > diff --git a/docs/interop/vhost-user.rst
> b/docs/interop/vhost-user.rst
> > >> > index d8419fd2f1..d52ba719d5 100644
> > >> > --- a/docs/interop/vhost-user.rst
> > >> > +++ b/docs/interop/vhost-user.rst
> > >> > @@ -1859,6 +1859,33 @@ is sent by the front-end.
> > >> >    when the operation is successful, or non-zero otherwise. Note
> that
> > >> if the
> > >> >    operation fails, no fd is sent to the backend.
> > >> >
> > >> > +``VHOST_USER_BACKEND_SHMEM_MAP``
> > >> > +  :id: 9
> > >> > +  :equivalent ioctl: N/A
> > >> > +  :request payload: fd and ``struct VhostUserMMap``
> > >> > +  :reply payload: N/A
> > >> > +
> > >> > +  This message can be submitted by the backends to advertise a new
> > >> mapping
> > >> > +  to be made in a given VIRTIO Shared Memory Region. Upon receiving
> > >> the message,
> > >> > +  The front-end will mmap the given fd into the VIRTIO Shared
> Memory
> > >> Region
> > >> > +  with the requested ``shmid``. A reply is generated indicating
> > >> whether mapping
> > >> > +  succeeded.
> > >> > +
> > >> > +  Mapping over an already existing map is not allowed and request
> > >> shall fail.
> > >> > +  Therefore, the memory range in the request must correspond with a
> > >> valid,
> > >> > +  free region of the VIRTIO Shared Memory Region.
> > >> > +
> > >> > +``VHOST_USER_BACKEND_SHMEM_UNMAP``
> > >> > +  :id: 10
> > >> > +  :equivalent ioctl: N/A
> > >> > +  :request payload: ``struct VhostUserMMap``
> > >> > +  :reply payload: N/A
> > >> > +
> > >> > +  This message can be submitted by the backends so that the
> front-end
> > >> un-mmap
> > >> > +  a given range (``offset``, ``len``) in the VIRTIO Shared Memory
> > >> Region with
> > >>
> > >> s/offset/shm_offset/
> > >>
> > >> > +  the requested ``shmid``.
> > >>
> > >> Please clarify that <offset, len> must correspond to the entirety of a
> > >> valid mapped region.
> > >>
> > >> By the way, the VIRTIO 1.3 gives the following behavior for the
> virtiofs
> > >> DAX Window:
> > >>
> > >>   When a FUSE_SETUPMAPPING request perfectly overlaps a previous
> > >>   mapping, the previous mapping is replaced. When a mapping partially
> > >>   overlaps a previous mapping, the previous mapping is split into one
> or
> > >>   two smaller mappings. When a mapping is partially unmapped it is
> also
> > >>   split into one or two smaller mappings.
> > >>
> > >>   Establishing new mappings or splitting existing mappings consumes
> > >>   resources. If the device runs out of resources the FUSE_SETUPMAPPING
> > >>   request fails until resources are available again following
> > >>   FUSE_REMOVEMAPPING.
> > >>
> > >> I think SETUPMAPPING/REMOVMAPPING can be implemented using
> > >> SHMEM_MAP/UNMAP. SHMEM_MAP/UNMAP do not allow atomically replacing
> > >> partial ranges, but as far as I know that's not necessary for virtiofs
> > >> in practice.
> > >>
> > >> It's worth mentioning that mappings consume resources and that
> SHMEM_MAP
> > >> can fail when there are no resources available. The process-wide limit
> > >> is vm.max_map_count on Linux although a vhost-user frontend may reduce
> > >> it further to control vhost-user resource usage.
> > >>
> > >> > +  A reply is generated indicating whether unmapping succeeded.
> > >> > +
> > >> >  .. _reply_ack:
> > >> >
> > >> >  VHOST_USER_PROTOCOL_F_REPLY_ACK
> > >> > diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> > >> > index cdf9af4a4b..7ee8a472c6 100644
> > >> > --- a/hw/virtio/vhost-user.c
> > >> > +++ b/hw/virtio/vhost-user.c
> > >> > @@ -115,6 +115,8 @@ typedef enum VhostUserBackendRequest {
> > >> >      VHOST_USER_BACKEND_SHARED_OBJECT_ADD = 6,
> > >> >      VHOST_USER_BACKEND_SHARED_OBJECT_REMOVE = 7,
> > >> >      VHOST_USER_BACKEND_SHARED_OBJECT_LOOKUP = 8,
> > >> > +    VHOST_USER_BACKEND_SHMEM_MAP = 9,
> > >> > +    VHOST_USER_BACKEND_SHMEM_UNMAP = 10,
> > >> >      VHOST_USER_BACKEND_MAX
> > >> >  }  VhostUserBackendRequest;
> > >> >
> > >> > @@ -192,6 +194,24 @@ typedef struct VhostUserShared {
> > >> >      unsigned char uuid[16];
> > >> >  } VhostUserShared;
> > >> >
> > >> > +/* For the flags field of VhostUserMMap */
> > >> > +#define VHOST_USER_FLAG_MAP_R (1u << 0)
> > >> > +#define VHOST_USER_FLAG_MAP_W (1u << 1)
> > >> > +
> > >> > +typedef struct {
> > >> > +    /* VIRTIO Shared Memory Region ID */
> > >> > +    uint8_t shmid;
> > >> > +    uint8_t padding[7];
> > >> > +    /* File offset */
> > >> > +    uint64_t fd_offset;
> > >> > +    /* Offset within the VIRTIO Shared Memory Region */
> > >> > +    uint64_t shm_offset;
> > >> > +    /* Size of the mapping */
> > >> > +    uint64_t len;
> > >> > +    /* Flags for the mmap operation, from VHOST_USER_FLAG_* */
> > >> > +    uint64_t flags;
> > >> > +} VhostUserMMap;
> > >> > +
> > >> >  typedef struct {
> > >> >      VhostUserRequest request;
> > >> >
> > >> > @@ -224,6 +244,7 @@ typedef union {
> > >> >          VhostUserInflight inflight;
> > >> >          VhostUserShared object;
> > >> >          VhostUserTransferDeviceState transfer_state;
> > >> > +        VhostUserMMap mmap;
> > >> >  } VhostUserPayload;
> > >> >
> > >> >  typedef struct VhostUserMsg {
> > >> > @@ -1748,6 +1769,100 @@
> > >> vhost_user_backend_handle_shared_object_lookup(struct vhost_user *u,
> > >> >      return 0;
> > >> >  }
> > >> >
> > >> > +static int
> > >> > +vhost_user_backend_handle_shmem_map(struct vhost_dev *dev,
> > >> > +                                    VhostUserMMap *vu_mmap,
> > >> > +                                    int fd)
> > >> > +{
> > >> > +    void *addr = 0;
> > >> > +    MemoryRegion *mr = NULL;
> > >> > +
> > >> > +    if (fd < 0) {
> > >> > +        error_report("Bad fd for map");
> > >> > +        return -EBADF;
> > >> > +    }
> > >> > +
> > >> > +    if (!dev->vdev->shmem_list ||
> > >> > +        dev->vdev->n_shmem_regions <= vu_mmap->shmid) {
> > >> > +        error_report("Device only has %d VIRTIO Shared Memory
> Regions.
> > >> "
> > >> > +                     "Requested ID: %d",
> > >> > +                     dev->vdev->n_shmem_regions, vu_mmap->shmid);
> > >> > +        return -EFAULT;
> > >> > +    }
> > >> > +
> > >> > +    mr = &dev->vdev->shmem_list[vu_mmap->shmid];
> > >> > +
> > >> > +    if (!mr) {
> > >> > +        error_report("VIRTIO Shared Memory Region at "
> > >> > +                     "ID %d unitialized", vu_mmap->shmid);
> > >> > +        return -EFAULT;
> > >> > +    }
> > >> > +
> > >> > +    if ((vu_mmap->shm_offset + vu_mmap->len) < vu_mmap->len ||
> > >> > +        (vu_mmap->shm_offset + vu_mmap->len) > mr->size) {
> > >> > +        error_report("Bad offset/len for mmap %" PRIx64 "+%"
> PRIx64,
> > >> > +                     vu_mmap->shm_offset, vu_mmap->len);
> > >> > +        return -EFAULT;
> > >> > +    }
> > >> > +
> > >> > +    void *shmem_ptr = memory_region_get_ram_ptr(mr);
> > >> > +
> > >> > +    addr = mmap(shmem_ptr + vu_mmap->shm_offset, vu_mmap->len,
> > >>
> > >> Missing check for overlap between range [shm_offset, shm_offset + len)
> > >> and existing mappings.
> > >>
> > >
> > > Not sure how to do this check. Specifically, I am not sure how previous
> > > ranges are stored within the MemoryRegion. Is looping through
> > > mr->subregions
> > > a valid option?
> > >
> >
> > Maybe something like this would do?
> > ```
> >      if (memory_region_find(mr, vu_mmap->shm_offset, vu_mmap->len).mr) {
> >         error_report("Requested memory (%" PRIx64 "+%" PRIx64 " overalps
> "
> >                      "with previously mapped memory",
> >                      vu_mmap->shm_offset, vu_mmap->len);
> >         return -EFAULT;
> >     }
> > ```
>
> I don't think that works because the QEMU MemoryRegion covers the entire
> range, some of which contains mappings and some of which is empty. It
> would be necessary to track mappings that have been made.
>
> I'm not aware of a security implication if the overlap check is missing,
> so I guess it may be okay to skip it and rely on the vhost-user back-end
> author to honor the spec. I'm not totally against that because it's
> faster and less code, but it feels a bit iffy to not enforce the input
> validation that the spec requires.
>
> Maintain a list of mappings so this check can be performed?
>
>
Ok, I prefer to aim for the better solution and see where that takes us.
So I will add a mapped_regions list or something like that to the
MemoryRegion struct in a new commit, so that it can be reviewed
independently. With the infrastructure's code in the patch we can decide if
it is worth to have it.

Thank you!


> >
> > >
> > >
> > >>
> > >> > +        ((vu_mmap->flags & VHOST_USER_FLAG_MAP_R) ? PROT_READ : 0)
> |
> > >> > +        ((vu_mmap->flags & VHOST_USER_FLAG_MAP_W) ? PROT_WRITE :
> 0),
> > >> > +        MAP_SHARED | MAP_FIXED, fd, vu_mmap->fd_offset);
> > >> > +
> > >> > +    if (addr == MAP_FAILED) {
> > >> > +        error_report("Failed to mmap mem fd");
> > >> > +        return -EFAULT;
> > >> > +    }
> > >> > +
> > >> > +    return 0;
> > >> > +}
> > >> > +
> > >> > +static int
> > >> > +vhost_user_backend_handle_shmem_unmap(struct vhost_dev *dev,
> > >> > +                                      VhostUserMMap *vu_mmap)
> > >> > +{
> > >> > +    void *addr = 0;
> > >> > +    MemoryRegion *mr = NULL;
> > >> > +
> > >> > +    if (!dev->vdev->shmem_list ||
> > >> > +        dev->vdev->n_shmem_regions <= vu_mmap->shmid) {
> > >> > +        error_report("Device only has %d VIRTIO Shared Memory
> Regions.
> > >> "
> > >> > +                     "Requested ID: %d",
> > >> > +                     dev->vdev->n_shmem_regions, vu_mmap->shmid);
> > >> > +        return -EFAULT;
> > >> > +    }
> > >> > +
> > >> > +    mr = &dev->vdev->shmem_list[vu_mmap->shmid];
> > >> > +
> > >> > +    if (!mr) {
> > >> > +        error_report("VIRTIO Shared Memory Region at "
> > >> > +                     "ID %d unitialized", vu_mmap->shmid);
> > >> > +        return -EFAULT;
> > >> > +    }
> > >> > +
> > >> > +    if ((vu_mmap->shm_offset + vu_mmap->len) < vu_mmap->len ||
> > >> > +        (vu_mmap->shm_offset + vu_mmap->len) > mr->size) {
> > >> > +        error_report("Bad offset/len for mmap %" PRIx64 "+%"
> PRIx64,
> > >> > +                     vu_mmap->shm_offset, vu_mmap->len);
> > >> > +        return -EFAULT;
> > >> > +    }
> > >> > +
> > >> > +    void *shmem_ptr = memory_region_get_ram_ptr(mr);
> > >> > +
> > >> > +    addr = mmap(shmem_ptr + vu_mmap->shm_offset, vu_mmap->len,
> > >>
> > >> Missing check for existing mapping with exact range [shm_offset, len)
> > >> match.
> > >>
> > >> > +                PROT_NONE, MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED,
> > >> -1, 0);
> > >> > +
> > >> > +    if (addr == MAP_FAILED) {
> > >> > +        error_report("Failed to unmap memory");
> > >> > +        return -EFAULT;
> > >> > +    }
> > >> > +
> > >> > +    return 0;
> > >> > +}
> > >> > +
> > >> >  static void close_backend_channel(struct vhost_user *u)
> > >> >  {
> > >> >      g_source_destroy(u->backend_src);
> > >> > @@ -1816,6 +1931,13 @@ static gboolean backend_read(QIOChannel *ioc,
> > >> GIOCondition condition,
> > >> >          ret =
> > >> vhost_user_backend_handle_shared_object_lookup(dev->opaque, ioc,
> > >> >                                                               &hdr,
> > >> &payload);
> > >> >          break;
> > >> > +    case VHOST_USER_BACKEND_SHMEM_MAP:
> > >> > +        ret = vhost_user_backend_handle_shmem_map(dev,
> &payload.mmap,
> > >> > +                                                  fd ? fd[0] : -1);
> > >> > +        break;
> > >> > +    case VHOST_USER_BACKEND_SHMEM_UNMAP:
> > >> > +        ret = vhost_user_backend_handle_shmem_unmap(dev,
> > >> &payload.mmap);
> > >> > +        break;
> > >> >      default:
> > >> >          error_report("Received unexpected msg type: %d.",
> hdr.request);
> > >> >          ret = -EINVAL;
> > >> > diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> > >> > index 893a072c9d..9f2da5b11e 100644
> > >> > --- a/hw/virtio/virtio.c
> > >> > +++ b/hw/virtio/virtio.c
> > >> > @@ -2856,6 +2856,16 @@ int virtio_save(VirtIODevice *vdev, QEMUFile
> *f)
> > >> >      return vmstate_save_state(f, &vmstate_virtio, vdev, NULL);
> > >> >  }
> > >> >
> > >> > +MemoryRegion *virtio_new_shmem_region(VirtIODevice *vdev)
> > >> > +{
> > >> > +    MemoryRegion *mr = g_new0(MemoryRegion, 1);
> > >> > +    ++vdev->n_shmem_regions;
> > >> > +    vdev->shmem_list = g_renew(MemoryRegion, vdev->shmem_list,
> > >> > +                               vdev->n_shmem_regions);
> > >>
> > >> Where is shmem_list freed?
> > >>
> > >> The name "list" is misleading since this is an array, not a list.
> > >>
> > >> > +    vdev->shmem_list[vdev->n_shmem_regions - 1] = *mr;
> > >> > +    return mr;
> > >> > +}
> > >>
> > >> This looks weird. The contents of mr are copied into shmem_list[] and
> > >> then the pointer to mr is returned? Did you mean for the field's type
> to
> > >> be MemoryRegion **shmem_list and then vdev->shmem_list[...] = mr would
> > >> stash the pointer?
> > >>
> > >> > +
> > >> >  /* A wrapper for use as a VMState .put function */
> > >> >  static int virtio_device_put(QEMUFile *f, void *opaque, size_t
> size,
> > >> >                                const VMStateField *field, JSONWriter
> > >> *vmdesc)
> > >> > @@ -3264,6 +3274,8 @@ void virtio_init(VirtIODevice *vdev, uint16_t
> > >> device_id, size_t config_size)
> > >> >              virtio_vmstate_change, vdev);
> > >> >      vdev->device_endian = virtio_default_endian();
> > >> >      vdev->use_guest_notifier_mask = true;
> > >> > +    vdev->shmem_list = NULL;
> > >> > +    vdev->n_shmem_regions = 0;
> > >> >  }
> > >> >
> > >> >  /*
> > >> > diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
> > >> > index 7d5ffdc145..16d598aadc 100644
> > >> > --- a/include/hw/virtio/virtio.h
> > >> > +++ b/include/hw/virtio/virtio.h
> > >> > @@ -165,6 +165,9 @@ struct VirtIODevice
> > >> >       */
> > >> >      EventNotifier config_notifier;
> > >> >      bool device_iotlb_enabled;
> > >> > +    /* Shared memory region for vhost-user mappings. */
> > >> > +    MemoryRegion *shmem_list;
> > >> > +    int n_shmem_regions;
> > >> >  };
> > >> >
> > >> >  struct VirtioDeviceClass {
> > >> > @@ -280,6 +283,8 @@ void virtio_notify(VirtIODevice *vdev, VirtQueue
> > >> *vq);
> > >> >
> > >> >  int virtio_save(VirtIODevice *vdev, QEMUFile *f);
> > >> >
> > >> > +MemoryRegion *virtio_new_shmem_region(VirtIODevice *vdev);
> > >> > +
> > >> >  extern const VMStateInfo virtio_vmstate_info;
> > >> >
> > >> >  #define VMSTATE_VIRTIO_DEVICE \
> > >> > diff --git a/subprojects/libvhost-user/libvhost-user.c
> > >> b/subprojects/libvhost-user/libvhost-user.c
> > >> > index a879149fef..28556d183a 100644
> > >> > --- a/subprojects/libvhost-user/libvhost-user.c
> > >> > +++ b/subprojects/libvhost-user/libvhost-user.c
> > >> > @@ -1586,6 +1586,71 @@ vu_rm_shared_object(VuDev *dev, unsigned char
> > >> uuid[UUID_LEN])
> > >> >      return vu_send_message(dev, &msg);
> > >> >  }
> > >> >
> > >> > +bool
> > >> > +vu_shmem_map(VuDev *dev, uint8_t shmid, uint64_t fd_offset,
> > >> > +             uint64_t shm_offset, uint64_t len, uint64_t flags)
> > >> > +{
> > >> > +    bool result = false;
> > >> > +    VhostUserMsg msg_reply;
> > >> > +    VhostUserMsg vmsg = {
> > >> > +        .request = VHOST_USER_BACKEND_SHMEM_MAP,
> > >> > +        .size = sizeof(vmsg.payload.mmap),
> > >> > +        .flags = VHOST_USER_VERSION,
> > >> > +        .payload.mmap = {
> > >> > +            .shmid = shmid,
> > >> > +            .fd_offset = fd_offset,
> > >> > +            .shm_offset = shm_offset,
> > >> > +            .len = len,
> > >> > +            .flags = flags,
> > >> > +        },
> > >> > +    };
> > >> > +
> > >> > +    if (vu_has_protocol_feature(dev,
> VHOST_USER_PROTOCOL_F_REPLY_ACK))
> > >> {
> > >> > +        vmsg.flags |= VHOST_USER_NEED_REPLY_MASK;
> > >> > +    }
> > >> > +
> > >> > +    pthread_mutex_lock(&dev->backend_mutex);
> > >> > +    if (!vu_message_write(dev, dev->backend_fd, &vmsg)) {
> > >> > +        pthread_mutex_unlock(&dev->backend_mutex);
> > >> > +        return false;
> > >> > +    }
> > >> > +
> > >> > +    /* Also unlocks the backend_mutex */
> > >> > +    return vu_process_message_reply(dev, &vmsg);
> > >> > +}
> > >> > +
> > >> > +bool
> > >> > +vu_shmem_unmap(VuDev *dev, uint8_t shmid, uint64_t fd_offset,
> > >> > +               uint64_t shm_offset, uint64_t len)
> > >> > +{
> > >> > +    bool result = false;
> > >> > +    VhostUserMsg msg_reply;
> > >> > +    VhostUserMsg vmsg = {
> > >> > +        .request = VHOST_USER_BACKEND_SHMEM_UNMAP,
> > >> > +        .size = sizeof(vmsg.payload.mmap),
> > >> > +        .flags = VHOST_USER_VERSION,
> > >> > +        .payload.mmap = {
> > >> > +            .shmid = shmid,
> > >> > +            .fd_offset = fd_offset,
> > >>
> > >> What is the meaning of this field? I expected it to be set to 0.
> > >>
> > >> > +            .shm_offset = shm_offset,
> > >> > +            .len = len,
> > >> > +        },
> > >> > +    };
> > >> > +
> > >> > +    if (vu_has_protocol_feature(dev,
> VHOST_USER_PROTOCOL_F_REPLY_ACK))
> > >> {
> > >> > +        vmsg.flags |= VHOST_USER_NEED_REPLY_MASK;
> > >> > +    }
> > >> > +
> > >> > +    pthread_mutex_lock(&dev->backend_mutex);
> > >> > +    if (!vu_message_write(dev, dev->backend_fd, &vmsg)) {
> > >> > +        pthread_mutex_unlock(&dev->backend_mutex);
> > >> > +        return false;
> > >> > +    }
> > >> > +
> > >> > +    /* Also unlocks the backend_mutex */
> > >> > +    return vu_process_message_reply(dev, &vmsg);
> > >> > +}
> > >> > +
> > >> >  static bool
> > >> >  vu_set_vring_call_exec(VuDev *dev, VhostUserMsg *vmsg)
> > >> >  {
> > >> > diff --git a/subprojects/libvhost-user/libvhost-user.h
> > >> b/subprojects/libvhost-user/libvhost-user.h
> > >> > index deb40e77b3..7f6c22cc1a 100644
> > >> > --- a/subprojects/libvhost-user/libvhost-user.h
> > >> > +++ b/subprojects/libvhost-user/libvhost-user.h
> > >> > @@ -127,6 +127,8 @@ typedef enum VhostUserBackendRequest {
> > >> >      VHOST_USER_BACKEND_SHARED_OBJECT_ADD = 6,
> > >> >      VHOST_USER_BACKEND_SHARED_OBJECT_REMOVE = 7,
> > >> >      VHOST_USER_BACKEND_SHARED_OBJECT_LOOKUP = 8,
> > >> > +    VHOST_USER_BACKEND_SHMEM_MAP = 9,
> > >> > +    VHOST_USER_BACKEND_SHMEM_UNMAP = 10,
> > >> >      VHOST_USER_BACKEND_MAX
> > >> >  }  VhostUserBackendRequest;
> > >> >
> > >> > @@ -186,6 +188,24 @@ typedef struct VhostUserShared {
> > >> >      unsigned char uuid[UUID_LEN];
> > >> >  } VhostUserShared;
> > >> >
> > >> > +/* For the flags field of VhostUserMMap */
> > >> > +#define VHOST_USER_FLAG_MAP_R (1u << 0)
> > >> > +#define VHOST_USER_FLAG_MAP_W (1u << 1)
> > >> > +
> > >> > +typedef struct {
> > >> > +    /* VIRTIO Shared Memory Region ID */
> > >> > +    uint8_t shmid;
> > >> > +    uint8_t padding[7];
> > >> > +    /* File offset */
> > >> > +    uint64_t fd_offset;
> > >> > +    /* Offset within the VIRTIO Shared Memory Region */
> > >> > +    uint64_t shm_offset;
> > >> > +    /* Size of the mapping */
> > >> > +    uint64_t len;
> > >> > +    /* Flags for the mmap operation, from VHOST_USER_FLAG_* */
> > >> > +    uint64_t flags;
> > >> > +} VhostUserMMap;
> > >> > +
> > >> >  #if defined(_WIN32) && (defined(__x86_64__) || defined(__i386__))
> > >> >  # define VU_PACKED __attribute__((gcc_struct, packed))
> > >> >  #else
> > >> > @@ -214,6 +234,7 @@ typedef struct VhostUserMsg {
> > >> >          VhostUserVringArea area;
> > >> >          VhostUserInflight inflight;
> > >> >          VhostUserShared object;
> > >> > +        VhostUserMMap mmap;
> > >> >      } payload;
> > >> >
> > >> >      int fds[VHOST_MEMORY_BASELINE_NREGIONS];
> > >> > @@ -597,6 +618,38 @@ bool vu_add_shared_object(VuDev *dev, unsigned
> > >> char uuid[UUID_LEN]);
> > >> >   */
> > >> >  bool vu_rm_shared_object(VuDev *dev, unsigned char uuid[UUID_LEN]);
> > >> >
> > >> > +/**
> > >> > + * vu_shmem_map:
> > >> > + * @dev: a VuDev context
> > >> > + * @shmid: VIRTIO Shared Memory Region ID
> > >> > + * @fd_offset: File offset
> > >> > + * @shm_offset: Offset within the VIRTIO Shared Memory Region
> > >> > + * @len: Size of the mapping
> > >> > + * @flags: Flags for the mmap operation
> > >> > + *
> > >> > + * Advertises a new mapping to be made in a given VIRTIO Shared
> Memory
> > >> Region.
> > >> > + *
> > >> > + * Returns: TRUE on success, FALSE on failure.
> > >> > + */
> > >> > +bool vu_shmem_map(VuDev *dev, uint8_t shmid, uint64_t fd_offset,
> > >> > +                  uint64_t shm_offset, uint64_t len, uint64_t
> flags);
> > >> > +
> > >> > +/**
> > >> > + * vu_shmem_map:
> > >> > + * @dev: a VuDev context
> > >> > + * @shmid: VIRTIO Shared Memory Region ID
> > >> > + * @fd_offset: File offset
> > >> > + * @shm_offset: Offset within the VIRTIO Shared Memory Region
> > >> > + * @len: Size of the mapping
> > >> > + *
> > >> > + * The front-end un-mmaps a given range in the VIRTIO Shared Memory
> > >> Region
> > >> > + * with the requested `shmid`.
> > >> > + *
> > >> > + * Returns: TRUE on success, FALSE on failure.
> > >> > + */
> > >> > +bool vu_shmem_unmap(VuDev *dev, uint8_t shmid, uint64_t fd_offset,
> > >> > +                  uint64_t shm_offset, uint64_t len);
> > >> > +
> > >> >  /**
> > >> >   * vu_queue_set_notification:
> > >> >   * @dev: a VuDev context
> > >> > --
> > >> > 2.45.2
> > >> >
> > >>
> > >
>
Stefan Hajnoczi Sept. 11, 2024, 2:54 p.m. UTC | #7
On Wed, 11 Sept 2024 at 07:58, Albert Esteve <aesteve@redhat.com> wrote:
> On Thu, Sep 5, 2024 at 6:45 PM Stefan Hajnoczi <stefanha@redhat.com> wrote:
>>
>> On Tue, Sep 03, 2024 at 01:54:12PM +0200, Albert Esteve wrote:
>> > On Tue, Sep 3, 2024 at 11:54 AM Albert Esteve <aesteve@redhat.com> wrote:
>> >
>> > >
>> > >
>> > > On Thu, Jul 11, 2024 at 9:45 AM Stefan Hajnoczi <stefanha@redhat.com>
>> > > wrote:
>> > >
>> > >> On Fri, Jun 28, 2024 at 04:57:06PM +0200, Albert Esteve wrote:
>> > >> > Add SHMEM_MAP/UNMAP requests to vhost-user to
>> > >> > handle VIRTIO Shared Memory mappings.
>> > >> >
>> > >> > This request allows backends to dynamically map
>> > >> > fds into a VIRTIO Shared Memory Region indentified
>> > >> > by its `shmid`. Then, the fd memory is advertised
>> > >> > to the driver as a base addres + offset, so it
>> > >> > can be read/written (depending on the mmap flags
>> > >> > requested) while its valid.
>> > >> >
>> > >> > The backend can munmap the memory range
>> > >> > in a given VIRTIO Shared Memory Region (again,
>> > >> > identified by its `shmid`), to free it. Upon
>> > >> > receiving this message, the front-end must
>> > >> > mmap the regions with PROT_NONE to reserve
>> > >> > the virtual memory space.
>> > >> >
>> > >> > The device model needs to create MemoryRegion
>> > >> > instances for the VIRTIO Shared Memory Regions
>> > >> > and add them to the `VirtIODevice` instance.
>> > >> >
>> > >> > Signed-off-by: Albert Esteve <aesteve@redhat.com>
>> > >> > ---
>> > >> >  docs/interop/vhost-user.rst               |  27 +++++
>> > >> >  hw/virtio/vhost-user.c                    | 122 ++++++++++++++++++++++
>> > >> >  hw/virtio/virtio.c                        |  12 +++
>> > >> >  include/hw/virtio/virtio.h                |   5 +
>> > >> >  subprojects/libvhost-user/libvhost-user.c |  65 ++++++++++++
>> > >> >  subprojects/libvhost-user/libvhost-user.h |  53 ++++++++++
>> > >> >  6 files changed, 284 insertions(+)
>> > >> >
>> > >> > diff --git a/docs/interop/vhost-user.rst b/docs/interop/vhost-user.rst
>> > >> > index d8419fd2f1..d52ba719d5 100644
>> > >> > --- a/docs/interop/vhost-user.rst
>> > >> > +++ b/docs/interop/vhost-user.rst
>> > >> > @@ -1859,6 +1859,33 @@ is sent by the front-end.
>> > >> >    when the operation is successful, or non-zero otherwise. Note that
>> > >> if the
>> > >> >    operation fails, no fd is sent to the backend.
>> > >> >
>> > >> > +``VHOST_USER_BACKEND_SHMEM_MAP``
>> > >> > +  :id: 9
>> > >> > +  :equivalent ioctl: N/A
>> > >> > +  :request payload: fd and ``struct VhostUserMMap``
>> > >> > +  :reply payload: N/A
>> > >> > +
>> > >> > +  This message can be submitted by the backends to advertise a new
>> > >> mapping
>> > >> > +  to be made in a given VIRTIO Shared Memory Region. Upon receiving
>> > >> the message,
>> > >> > +  The front-end will mmap the given fd into the VIRTIO Shared Memory
>> > >> Region
>> > >> > +  with the requested ``shmid``. A reply is generated indicating
>> > >> whether mapping
>> > >> > +  succeeded.
>> > >> > +
>> > >> > +  Mapping over an already existing map is not allowed and request
>> > >> shall fail.
>> > >> > +  Therefore, the memory range in the request must correspond with a
>> > >> valid,
>> > >> > +  free region of the VIRTIO Shared Memory Region.
>> > >> > +
>> > >> > +``VHOST_USER_BACKEND_SHMEM_UNMAP``
>> > >> > +  :id: 10
>> > >> > +  :equivalent ioctl: N/A
>> > >> > +  :request payload: ``struct VhostUserMMap``
>> > >> > +  :reply payload: N/A
>> > >> > +
>> > >> > +  This message can be submitted by the backends so that the front-end
>> > >> un-mmap
>> > >> > +  a given range (``offset``, ``len``) in the VIRTIO Shared Memory
>> > >> Region with
>> > >>
>> > >> s/offset/shm_offset/
>> > >>
>> > >> > +  the requested ``shmid``.
>> > >>
>> > >> Please clarify that <offset, len> must correspond to the entirety of a
>> > >> valid mapped region.
>> > >>
>> > >> By the way, the VIRTIO 1.3 gives the following behavior for the virtiofs
>> > >> DAX Window:
>> > >>
>> > >>   When a FUSE_SETUPMAPPING request perfectly overlaps a previous
>> > >>   mapping, the previous mapping is replaced. When a mapping partially
>> > >>   overlaps a previous mapping, the previous mapping is split into one or
>> > >>   two smaller mappings. When a mapping is partially unmapped it is also
>> > >>   split into one or two smaller mappings.
>> > >>
>> > >>   Establishing new mappings or splitting existing mappings consumes
>> > >>   resources. If the device runs out of resources the FUSE_SETUPMAPPING
>> > >>   request fails until resources are available again following
>> > >>   FUSE_REMOVEMAPPING.
>> > >>
>> > >> I think SETUPMAPPING/REMOVMAPPING can be implemented using
>> > >> SHMEM_MAP/UNMAP. SHMEM_MAP/UNMAP do not allow atomically replacing
>> > >> partial ranges, but as far as I know that's not necessary for virtiofs
>> > >> in practice.
>> > >>
>> > >> It's worth mentioning that mappings consume resources and that SHMEM_MAP
>> > >> can fail when there are no resources available. The process-wide limit
>> > >> is vm.max_map_count on Linux although a vhost-user frontend may reduce
>> > >> it further to control vhost-user resource usage.
>> > >>
>> > >> > +  A reply is generated indicating whether unmapping succeeded.
>> > >> > +
>> > >> >  .. _reply_ack:
>> > >> >
>> > >> >  VHOST_USER_PROTOCOL_F_REPLY_ACK
>> > >> > diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
>> > >> > index cdf9af4a4b..7ee8a472c6 100644
>> > >> > --- a/hw/virtio/vhost-user.c
>> > >> > +++ b/hw/virtio/vhost-user.c
>> > >> > @@ -115,6 +115,8 @@ typedef enum VhostUserBackendRequest {
>> > >> >      VHOST_USER_BACKEND_SHARED_OBJECT_ADD = 6,
>> > >> >      VHOST_USER_BACKEND_SHARED_OBJECT_REMOVE = 7,
>> > >> >      VHOST_USER_BACKEND_SHARED_OBJECT_LOOKUP = 8,
>> > >> > +    VHOST_USER_BACKEND_SHMEM_MAP = 9,
>> > >> > +    VHOST_USER_BACKEND_SHMEM_UNMAP = 10,
>> > >> >      VHOST_USER_BACKEND_MAX
>> > >> >  }  VhostUserBackendRequest;
>> > >> >
>> > >> > @@ -192,6 +194,24 @@ typedef struct VhostUserShared {
>> > >> >      unsigned char uuid[16];
>> > >> >  } VhostUserShared;
>> > >> >
>> > >> > +/* For the flags field of VhostUserMMap */
>> > >> > +#define VHOST_USER_FLAG_MAP_R (1u << 0)
>> > >> > +#define VHOST_USER_FLAG_MAP_W (1u << 1)
>> > >> > +
>> > >> > +typedef struct {
>> > >> > +    /* VIRTIO Shared Memory Region ID */
>> > >> > +    uint8_t shmid;
>> > >> > +    uint8_t padding[7];
>> > >> > +    /* File offset */
>> > >> > +    uint64_t fd_offset;
>> > >> > +    /* Offset within the VIRTIO Shared Memory Region */
>> > >> > +    uint64_t shm_offset;
>> > >> > +    /* Size of the mapping */
>> > >> > +    uint64_t len;
>> > >> > +    /* Flags for the mmap operation, from VHOST_USER_FLAG_* */
>> > >> > +    uint64_t flags;
>> > >> > +} VhostUserMMap;
>> > >> > +
>> > >> >  typedef struct {
>> > >> >      VhostUserRequest request;
>> > >> >
>> > >> > @@ -224,6 +244,7 @@ typedef union {
>> > >> >          VhostUserInflight inflight;
>> > >> >          VhostUserShared object;
>> > >> >          VhostUserTransferDeviceState transfer_state;
>> > >> > +        VhostUserMMap mmap;
>> > >> >  } VhostUserPayload;
>> > >> >
>> > >> >  typedef struct VhostUserMsg {
>> > >> > @@ -1748,6 +1769,100 @@
>> > >> vhost_user_backend_handle_shared_object_lookup(struct vhost_user *u,
>> > >> >      return 0;
>> > >> >  }
>> > >> >
>> > >> > +static int
>> > >> > +vhost_user_backend_handle_shmem_map(struct vhost_dev *dev,
>> > >> > +                                    VhostUserMMap *vu_mmap,
>> > >> > +                                    int fd)
>> > >> > +{
>> > >> > +    void *addr = 0;
>> > >> > +    MemoryRegion *mr = NULL;
>> > >> > +
>> > >> > +    if (fd < 0) {
>> > >> > +        error_report("Bad fd for map");
>> > >> > +        return -EBADF;
>> > >> > +    }
>> > >> > +
>> > >> > +    if (!dev->vdev->shmem_list ||
>> > >> > +        dev->vdev->n_shmem_regions <= vu_mmap->shmid) {
>> > >> > +        error_report("Device only has %d VIRTIO Shared Memory Regions.
>> > >> "
>> > >> > +                     "Requested ID: %d",
>> > >> > +                     dev->vdev->n_shmem_regions, vu_mmap->shmid);
>> > >> > +        return -EFAULT;
>> > >> > +    }
>> > >> > +
>> > >> > +    mr = &dev->vdev->shmem_list[vu_mmap->shmid];
>> > >> > +
>> > >> > +    if (!mr) {
>> > >> > +        error_report("VIRTIO Shared Memory Region at "
>> > >> > +                     "ID %d unitialized", vu_mmap->shmid);
>> > >> > +        return -EFAULT;
>> > >> > +    }
>> > >> > +
>> > >> > +    if ((vu_mmap->shm_offset + vu_mmap->len) < vu_mmap->len ||
>> > >> > +        (vu_mmap->shm_offset + vu_mmap->len) > mr->size) {
>> > >> > +        error_report("Bad offset/len for mmap %" PRIx64 "+%" PRIx64,
>> > >> > +                     vu_mmap->shm_offset, vu_mmap->len);
>> > >> > +        return -EFAULT;
>> > >> > +    }
>> > >> > +
>> > >> > +    void *shmem_ptr = memory_region_get_ram_ptr(mr);
>> > >> > +
>> > >> > +    addr = mmap(shmem_ptr + vu_mmap->shm_offset, vu_mmap->len,
>> > >>
>> > >> Missing check for overlap between range [shm_offset, shm_offset + len)
>> > >> and existing mappings.
>> > >>
>> > >
>> > > Not sure how to do this check. Specifically, I am not sure how previous
>> > > ranges are stored within the MemoryRegion. Is looping through
>> > > mr->subregions
>> > > a valid option?
>> > >
>> >
>> > Maybe something like this would do?
>> > ```
>> >      if (memory_region_find(mr, vu_mmap->shm_offset, vu_mmap->len).mr) {
>> >         error_report("Requested memory (%" PRIx64 "+%" PRIx64 " overalps "
>> >                      "with previously mapped memory",
>> >                      vu_mmap->shm_offset, vu_mmap->len);
>> >         return -EFAULT;
>> >     }
>> > ```
>>
>> I don't think that works because the QEMU MemoryRegion covers the entire
>> range, some of which contains mappings and some of which is empty. It
>> would be necessary to track mappings that have been made.
>>
>> I'm not aware of a security implication if the overlap check is missing,
>> so I guess it may be okay to skip it and rely on the vhost-user back-end
>> author to honor the spec. I'm not totally against that because it's
>> faster and less code, but it feels a bit iffy to not enforce the input
>> validation that the spec requires.
>>
>> Maintain a list of mappings so this check can be performed?
>>
>
> Ok, I prefer to aim for the better solution and see where that takes us.
> So I will add a mapped_regions list or something like that to the
> MemoryRegion struct in a new commit, so that it can be reviewed
> independently. With the infrastructure's code in the patch we can decide if
> it is worth to have it.

Great. MemoryRegion is a core struct that's not related to vhost-user.
I don't think anything else needs a mappings list. Maybe add the
mappings list to shmem_list[] elements instead so that each VIRTIO
Shared Memory Region has a mappings list? Each element could be a
struct with a MemoryRegion field and a mappings list.

Stefan

>
> Thank you!
>
>>
>> >
>> > >
>> > >
>> > >>
>> > >> > +        ((vu_mmap->flags & VHOST_USER_FLAG_MAP_R) ? PROT_READ : 0) |
>> > >> > +        ((vu_mmap->flags & VHOST_USER_FLAG_MAP_W) ? PROT_WRITE : 0),
>> > >> > +        MAP_SHARED | MAP_FIXED, fd, vu_mmap->fd_offset);
>> > >> > +
>> > >> > +    if (addr == MAP_FAILED) {
>> > >> > +        error_report("Failed to mmap mem fd");
>> > >> > +        return -EFAULT;
>> > >> > +    }
>> > >> > +
>> > >> > +    return 0;
>> > >> > +}
>> > >> > +
>> > >> > +static int
>> > >> > +vhost_user_backend_handle_shmem_unmap(struct vhost_dev *dev,
>> > >> > +                                      VhostUserMMap *vu_mmap)
>> > >> > +{
>> > >> > +    void *addr = 0;
>> > >> > +    MemoryRegion *mr = NULL;
>> > >> > +
>> > >> > +    if (!dev->vdev->shmem_list ||
>> > >> > +        dev->vdev->n_shmem_regions <= vu_mmap->shmid) {
>> > >> > +        error_report("Device only has %d VIRTIO Shared Memory Regions.
>> > >> "
>> > >> > +                     "Requested ID: %d",
>> > >> > +                     dev->vdev->n_shmem_regions, vu_mmap->shmid);
>> > >> > +        return -EFAULT;
>> > >> > +    }
>> > >> > +
>> > >> > +    mr = &dev->vdev->shmem_list[vu_mmap->shmid];
>> > >> > +
>> > >> > +    if (!mr) {
>> > >> > +        error_report("VIRTIO Shared Memory Region at "
>> > >> > +                     "ID %d unitialized", vu_mmap->shmid);
>> > >> > +        return -EFAULT;
>> > >> > +    }
>> > >> > +
>> > >> > +    if ((vu_mmap->shm_offset + vu_mmap->len) < vu_mmap->len ||
>> > >> > +        (vu_mmap->shm_offset + vu_mmap->len) > mr->size) {
>> > >> > +        error_report("Bad offset/len for mmap %" PRIx64 "+%" PRIx64,
>> > >> > +                     vu_mmap->shm_offset, vu_mmap->len);
>> > >> > +        return -EFAULT;
>> > >> > +    }
>> > >> > +
>> > >> > +    void *shmem_ptr = memory_region_get_ram_ptr(mr);
>> > >> > +
>> > >> > +    addr = mmap(shmem_ptr + vu_mmap->shm_offset, vu_mmap->len,
>> > >>
>> > >> Missing check for existing mapping with exact range [shm_offset, len)
>> > >> match.
>> > >>
>> > >> > +                PROT_NONE, MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED,
>> > >> -1, 0);
>> > >> > +
>> > >> > +    if (addr == MAP_FAILED) {
>> > >> > +        error_report("Failed to unmap memory");
>> > >> > +        return -EFAULT;
>> > >> > +    }
>> > >> > +
>> > >> > +    return 0;
>> > >> > +}
>> > >> > +
>> > >> >  static void close_backend_channel(struct vhost_user *u)
>> > >> >  {
>> > >> >      g_source_destroy(u->backend_src);
>> > >> > @@ -1816,6 +1931,13 @@ static gboolean backend_read(QIOChannel *ioc,
>> > >> GIOCondition condition,
>> > >> >          ret =
>> > >> vhost_user_backend_handle_shared_object_lookup(dev->opaque, ioc,
>> > >> >                                                               &hdr,
>> > >> &payload);
>> > >> >          break;
>> > >> > +    case VHOST_USER_BACKEND_SHMEM_MAP:
>> > >> > +        ret = vhost_user_backend_handle_shmem_map(dev, &payload.mmap,
>> > >> > +                                                  fd ? fd[0] : -1);
>> > >> > +        break;
>> > >> > +    case VHOST_USER_BACKEND_SHMEM_UNMAP:
>> > >> > +        ret = vhost_user_backend_handle_shmem_unmap(dev,
>> > >> &payload.mmap);
>> > >> > +        break;
>> > >> >      default:
>> > >> >          error_report("Received unexpected msg type: %d.", hdr.request);
>> > >> >          ret = -EINVAL;
>> > >> > diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
>> > >> > index 893a072c9d..9f2da5b11e 100644
>> > >> > --- a/hw/virtio/virtio.c
>> > >> > +++ b/hw/virtio/virtio.c
>> > >> > @@ -2856,6 +2856,16 @@ int virtio_save(VirtIODevice *vdev, QEMUFile *f)
>> > >> >      return vmstate_save_state(f, &vmstate_virtio, vdev, NULL);
>> > >> >  }
>> > >> >
>> > >> > +MemoryRegion *virtio_new_shmem_region(VirtIODevice *vdev)
>> > >> > +{
>> > >> > +    MemoryRegion *mr = g_new0(MemoryRegion, 1);
>> > >> > +    ++vdev->n_shmem_regions;
>> > >> > +    vdev->shmem_list = g_renew(MemoryRegion, vdev->shmem_list,
>> > >> > +                               vdev->n_shmem_regions);
>> > >>
>> > >> Where is shmem_list freed?
>> > >>
>> > >> The name "list" is misleading since this is an array, not a list.
>> > >>
>> > >> > +    vdev->shmem_list[vdev->n_shmem_regions - 1] = *mr;
>> > >> > +    return mr;
>> > >> > +}
>> > >>
>> > >> This looks weird. The contents of mr are copied into shmem_list[] and
>> > >> then the pointer to mr is returned? Did you mean for the field's type to
>> > >> be MemoryRegion **shmem_list and then vdev->shmem_list[...] = mr would
>> > >> stash the pointer?
>> > >>
>> > >> > +
>> > >> >  /* A wrapper for use as a VMState .put function */
>> > >> >  static int virtio_device_put(QEMUFile *f, void *opaque, size_t size,
>> > >> >                                const VMStateField *field, JSONWriter
>> > >> *vmdesc)
>> > >> > @@ -3264,6 +3274,8 @@ void virtio_init(VirtIODevice *vdev, uint16_t
>> > >> device_id, size_t config_size)
>> > >> >              virtio_vmstate_change, vdev);
>> > >> >      vdev->device_endian = virtio_default_endian();
>> > >> >      vdev->use_guest_notifier_mask = true;
>> > >> > +    vdev->shmem_list = NULL;
>> > >> > +    vdev->n_shmem_regions = 0;
>> > >> >  }
>> > >> >
>> > >> >  /*
>> > >> > diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
>> > >> > index 7d5ffdc145..16d598aadc 100644
>> > >> > --- a/include/hw/virtio/virtio.h
>> > >> > +++ b/include/hw/virtio/virtio.h
>> > >> > @@ -165,6 +165,9 @@ struct VirtIODevice
>> > >> >       */
>> > >> >      EventNotifier config_notifier;
>> > >> >      bool device_iotlb_enabled;
>> > >> > +    /* Shared memory region for vhost-user mappings. */
>> > >> > +    MemoryRegion *shmem_list;
>> > >> > +    int n_shmem_regions;
>> > >> >  };
>> > >> >
>> > >> >  struct VirtioDeviceClass {
>> > >> > @@ -280,6 +283,8 @@ void virtio_notify(VirtIODevice *vdev, VirtQueue
>> > >> *vq);
>> > >> >
>> > >> >  int virtio_save(VirtIODevice *vdev, QEMUFile *f);
>> > >> >
>> > >> > +MemoryRegion *virtio_new_shmem_region(VirtIODevice *vdev);
>> > >> > +
>> > >> >  extern const VMStateInfo virtio_vmstate_info;
>> > >> >
>> > >> >  #define VMSTATE_VIRTIO_DEVICE \
>> > >> > diff --git a/subprojects/libvhost-user/libvhost-user.c
>> > >> b/subprojects/libvhost-user/libvhost-user.c
>> > >> > index a879149fef..28556d183a 100644
>> > >> > --- a/subprojects/libvhost-user/libvhost-user.c
>> > >> > +++ b/subprojects/libvhost-user/libvhost-user.c
>> > >> > @@ -1586,6 +1586,71 @@ vu_rm_shared_object(VuDev *dev, unsigned char
>> > >> uuid[UUID_LEN])
>> > >> >      return vu_send_message(dev, &msg);
>> > >> >  }
>> > >> >
>> > >> > +bool
>> > >> > +vu_shmem_map(VuDev *dev, uint8_t shmid, uint64_t fd_offset,
>> > >> > +             uint64_t shm_offset, uint64_t len, uint64_t flags)
>> > >> > +{
>> > >> > +    bool result = false;
>> > >> > +    VhostUserMsg msg_reply;
>> > >> > +    VhostUserMsg vmsg = {
>> > >> > +        .request = VHOST_USER_BACKEND_SHMEM_MAP,
>> > >> > +        .size = sizeof(vmsg.payload.mmap),
>> > >> > +        .flags = VHOST_USER_VERSION,
>> > >> > +        .payload.mmap = {
>> > >> > +            .shmid = shmid,
>> > >> > +            .fd_offset = fd_offset,
>> > >> > +            .shm_offset = shm_offset,
>> > >> > +            .len = len,
>> > >> > +            .flags = flags,
>> > >> > +        },
>> > >> > +    };
>> > >> > +
>> > >> > +    if (vu_has_protocol_feature(dev, VHOST_USER_PROTOCOL_F_REPLY_ACK))
>> > >> {
>> > >> > +        vmsg.flags |= VHOST_USER_NEED_REPLY_MASK;
>> > >> > +    }
>> > >> > +
>> > >> > +    pthread_mutex_lock(&dev->backend_mutex);
>> > >> > +    if (!vu_message_write(dev, dev->backend_fd, &vmsg)) {
>> > >> > +        pthread_mutex_unlock(&dev->backend_mutex);
>> > >> > +        return false;
>> > >> > +    }
>> > >> > +
>> > >> > +    /* Also unlocks the backend_mutex */
>> > >> > +    return vu_process_message_reply(dev, &vmsg);
>> > >> > +}
>> > >> > +
>> > >> > +bool
>> > >> > +vu_shmem_unmap(VuDev *dev, uint8_t shmid, uint64_t fd_offset,
>> > >> > +               uint64_t shm_offset, uint64_t len)
>> > >> > +{
>> > >> > +    bool result = false;
>> > >> > +    VhostUserMsg msg_reply;
>> > >> > +    VhostUserMsg vmsg = {
>> > >> > +        .request = VHOST_USER_BACKEND_SHMEM_UNMAP,
>> > >> > +        .size = sizeof(vmsg.payload.mmap),
>> > >> > +        .flags = VHOST_USER_VERSION,
>> > >> > +        .payload.mmap = {
>> > >> > +            .shmid = shmid,
>> > >> > +            .fd_offset = fd_offset,
>> > >>
>> > >> What is the meaning of this field? I expected it to be set to 0.
>> > >>
>> > >> > +            .shm_offset = shm_offset,
>> > >> > +            .len = len,
>> > >> > +        },
>> > >> > +    };
>> > >> > +
>> > >> > +    if (vu_has_protocol_feature(dev, VHOST_USER_PROTOCOL_F_REPLY_ACK))
>> > >> {
>> > >> > +        vmsg.flags |= VHOST_USER_NEED_REPLY_MASK;
>> > >> > +    }
>> > >> > +
>> > >> > +    pthread_mutex_lock(&dev->backend_mutex);
>> > >> > +    if (!vu_message_write(dev, dev->backend_fd, &vmsg)) {
>> > >> > +        pthread_mutex_unlock(&dev->backend_mutex);
>> > >> > +        return false;
>> > >> > +    }
>> > >> > +
>> > >> > +    /* Also unlocks the backend_mutex */
>> > >> > +    return vu_process_message_reply(dev, &vmsg);
>> > >> > +}
>> > >> > +
>> > >> >  static bool
>> > >> >  vu_set_vring_call_exec(VuDev *dev, VhostUserMsg *vmsg)
>> > >> >  {
>> > >> > diff --git a/subprojects/libvhost-user/libvhost-user.h
>> > >> b/subprojects/libvhost-user/libvhost-user.h
>> > >> > index deb40e77b3..7f6c22cc1a 100644
>> > >> > --- a/subprojects/libvhost-user/libvhost-user.h
>> > >> > +++ b/subprojects/libvhost-user/libvhost-user.h
>> > >> > @@ -127,6 +127,8 @@ typedef enum VhostUserBackendRequest {
>> > >> >      VHOST_USER_BACKEND_SHARED_OBJECT_ADD = 6,
>> > >> >      VHOST_USER_BACKEND_SHARED_OBJECT_REMOVE = 7,
>> > >> >      VHOST_USER_BACKEND_SHARED_OBJECT_LOOKUP = 8,
>> > >> > +    VHOST_USER_BACKEND_SHMEM_MAP = 9,
>> > >> > +    VHOST_USER_BACKEND_SHMEM_UNMAP = 10,
>> > >> >      VHOST_USER_BACKEND_MAX
>> > >> >  }  VhostUserBackendRequest;
>> > >> >
>> > >> > @@ -186,6 +188,24 @@ typedef struct VhostUserShared {
>> > >> >      unsigned char uuid[UUID_LEN];
>> > >> >  } VhostUserShared;
>> > >> >
>> > >> > +/* For the flags field of VhostUserMMap */
>> > >> > +#define VHOST_USER_FLAG_MAP_R (1u << 0)
>> > >> > +#define VHOST_USER_FLAG_MAP_W (1u << 1)
>> > >> > +
>> > >> > +typedef struct {
>> > >> > +    /* VIRTIO Shared Memory Region ID */
>> > >> > +    uint8_t shmid;
>> > >> > +    uint8_t padding[7];
>> > >> > +    /* File offset */
>> > >> > +    uint64_t fd_offset;
>> > >> > +    /* Offset within the VIRTIO Shared Memory Region */
>> > >> > +    uint64_t shm_offset;
>> > >> > +    /* Size of the mapping */
>> > >> > +    uint64_t len;
>> > >> > +    /* Flags for the mmap operation, from VHOST_USER_FLAG_* */
>> > >> > +    uint64_t flags;
>> > >> > +} VhostUserMMap;
>> > >> > +
>> > >> >  #if defined(_WIN32) && (defined(__x86_64__) || defined(__i386__))
>> > >> >  # define VU_PACKED __attribute__((gcc_struct, packed))
>> > >> >  #else
>> > >> > @@ -214,6 +234,7 @@ typedef struct VhostUserMsg {
>> > >> >          VhostUserVringArea area;
>> > >> >          VhostUserInflight inflight;
>> > >> >          VhostUserShared object;
>> > >> > +        VhostUserMMap mmap;
>> > >> >      } payload;
>> > >> >
>> > >> >      int fds[VHOST_MEMORY_BASELINE_NREGIONS];
>> > >> > @@ -597,6 +618,38 @@ bool vu_add_shared_object(VuDev *dev, unsigned
>> > >> char uuid[UUID_LEN]);
>> > >> >   */
>> > >> >  bool vu_rm_shared_object(VuDev *dev, unsigned char uuid[UUID_LEN]);
>> > >> >
>> > >> > +/**
>> > >> > + * vu_shmem_map:
>> > >> > + * @dev: a VuDev context
>> > >> > + * @shmid: VIRTIO Shared Memory Region ID
>> > >> > + * @fd_offset: File offset
>> > >> > + * @shm_offset: Offset within the VIRTIO Shared Memory Region
>> > >> > + * @len: Size of the mapping
>> > >> > + * @flags: Flags for the mmap operation
>> > >> > + *
>> > >> > + * Advertises a new mapping to be made in a given VIRTIO Shared Memory
>> > >> Region.
>> > >> > + *
>> > >> > + * Returns: TRUE on success, FALSE on failure.
>> > >> > + */
>> > >> > +bool vu_shmem_map(VuDev *dev, uint8_t shmid, uint64_t fd_offset,
>> > >> > +                  uint64_t shm_offset, uint64_t len, uint64_t flags);
>> > >> > +
>> > >> > +/**
>> > >> > + * vu_shmem_map:
>> > >> > + * @dev: a VuDev context
>> > >> > + * @shmid: VIRTIO Shared Memory Region ID
>> > >> > + * @fd_offset: File offset
>> > >> > + * @shm_offset: Offset within the VIRTIO Shared Memory Region
>> > >> > + * @len: Size of the mapping
>> > >> > + *
>> > >> > + * The front-end un-mmaps a given range in the VIRTIO Shared Memory
>> > >> Region
>> > >> > + * with the requested `shmid`.
>> > >> > + *
>> > >> > + * Returns: TRUE on success, FALSE on failure.
>> > >> > + */
>> > >> > +bool vu_shmem_unmap(VuDev *dev, uint8_t shmid, uint64_t fd_offset,
>> > >> > +                  uint64_t shm_offset, uint64_t len);
>> > >> > +
>> > >> >  /**
>> > >> >   * vu_queue_set_notification:
>> > >> >   * @dev: a VuDev context
>> > >> > --
>> > >> > 2.45.2
>> > >> >
>> > >>
>> > >
diff mbox series

Patch

diff --git a/docs/interop/vhost-user.rst b/docs/interop/vhost-user.rst
index d8419fd2f1..d52ba719d5 100644
--- a/docs/interop/vhost-user.rst
+++ b/docs/interop/vhost-user.rst
@@ -1859,6 +1859,33 @@  is sent by the front-end.
   when the operation is successful, or non-zero otherwise. Note that if the
   operation fails, no fd is sent to the backend.
 
+``VHOST_USER_BACKEND_SHMEM_MAP``
+  :id: 9
+  :equivalent ioctl: N/A
+  :request payload: fd and ``struct VhostUserMMap``
+  :reply payload: N/A
+
+  This message can be submitted by the backends to advertise a new mapping
+  to be made in a given VIRTIO Shared Memory Region. Upon receiving the message,
+  The front-end will mmap the given fd into the VIRTIO Shared Memory Region
+  with the requested ``shmid``. A reply is generated indicating whether mapping
+  succeeded.
+
+  Mapping over an already existing map is not allowed and request shall fail.
+  Therefore, the memory range in the request must correspond with a valid,
+  free region of the VIRTIO Shared Memory Region.
+
+``VHOST_USER_BACKEND_SHMEM_UNMAP``
+  :id: 10
+  :equivalent ioctl: N/A
+  :request payload: ``struct VhostUserMMap``
+  :reply payload: N/A
+
+  This message can be submitted by the backends so that the front-end un-mmap
+  a given range (``offset``, ``len``) in the VIRTIO Shared Memory Region with
+  the requested ``shmid``.
+  A reply is generated indicating whether unmapping succeeded.
+
 .. _reply_ack:
 
 VHOST_USER_PROTOCOL_F_REPLY_ACK
diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index cdf9af4a4b..7ee8a472c6 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -115,6 +115,8 @@  typedef enum VhostUserBackendRequest {
     VHOST_USER_BACKEND_SHARED_OBJECT_ADD = 6,
     VHOST_USER_BACKEND_SHARED_OBJECT_REMOVE = 7,
     VHOST_USER_BACKEND_SHARED_OBJECT_LOOKUP = 8,
+    VHOST_USER_BACKEND_SHMEM_MAP = 9,
+    VHOST_USER_BACKEND_SHMEM_UNMAP = 10,
     VHOST_USER_BACKEND_MAX
 }  VhostUserBackendRequest;
 
@@ -192,6 +194,24 @@  typedef struct VhostUserShared {
     unsigned char uuid[16];
 } VhostUserShared;
 
+/* For the flags field of VhostUserMMap */
+#define VHOST_USER_FLAG_MAP_R (1u << 0)
+#define VHOST_USER_FLAG_MAP_W (1u << 1)
+
+typedef struct {
+    /* VIRTIO Shared Memory Region ID */
+    uint8_t shmid;
+    uint8_t padding[7];
+    /* File offset */
+    uint64_t fd_offset;
+    /* Offset within the VIRTIO Shared Memory Region */
+    uint64_t shm_offset;
+    /* Size of the mapping */
+    uint64_t len;
+    /* Flags for the mmap operation, from VHOST_USER_FLAG_* */
+    uint64_t flags;
+} VhostUserMMap;
+
 typedef struct {
     VhostUserRequest request;
 
@@ -224,6 +244,7 @@  typedef union {
         VhostUserInflight inflight;
         VhostUserShared object;
         VhostUserTransferDeviceState transfer_state;
+        VhostUserMMap mmap;
 } VhostUserPayload;
 
 typedef struct VhostUserMsg {
@@ -1748,6 +1769,100 @@  vhost_user_backend_handle_shared_object_lookup(struct vhost_user *u,
     return 0;
 }
 
+static int
+vhost_user_backend_handle_shmem_map(struct vhost_dev *dev,
+                                    VhostUserMMap *vu_mmap,
+                                    int fd)
+{
+    void *addr = 0;
+    MemoryRegion *mr = NULL;
+
+    if (fd < 0) {
+        error_report("Bad fd for map");
+        return -EBADF;
+    }
+
+    if (!dev->vdev->shmem_list ||
+        dev->vdev->n_shmem_regions <= vu_mmap->shmid) {
+        error_report("Device only has %d VIRTIO Shared Memory Regions. "
+                     "Requested ID: %d",
+                     dev->vdev->n_shmem_regions, vu_mmap->shmid);
+        return -EFAULT;
+    }
+
+    mr = &dev->vdev->shmem_list[vu_mmap->shmid];
+
+    if (!mr) {
+        error_report("VIRTIO Shared Memory Region at "
+                     "ID %d unitialized", vu_mmap->shmid);
+        return -EFAULT;
+    }
+
+    if ((vu_mmap->shm_offset + vu_mmap->len) < vu_mmap->len ||
+        (vu_mmap->shm_offset + vu_mmap->len) > mr->size) {
+        error_report("Bad offset/len for mmap %" PRIx64 "+%" PRIx64,
+                     vu_mmap->shm_offset, vu_mmap->len);
+        return -EFAULT;
+    }
+
+    void *shmem_ptr = memory_region_get_ram_ptr(mr);
+
+    addr = mmap(shmem_ptr + vu_mmap->shm_offset, vu_mmap->len,
+        ((vu_mmap->flags & VHOST_USER_FLAG_MAP_R) ? PROT_READ : 0) |
+        ((vu_mmap->flags & VHOST_USER_FLAG_MAP_W) ? PROT_WRITE : 0),
+        MAP_SHARED | MAP_FIXED, fd, vu_mmap->fd_offset);
+
+    if (addr == MAP_FAILED) {
+        error_report("Failed to mmap mem fd");
+        return -EFAULT;
+    }
+
+    return 0;
+}
+
+static int
+vhost_user_backend_handle_shmem_unmap(struct vhost_dev *dev,
+                                      VhostUserMMap *vu_mmap)
+{
+    void *addr = 0;
+    MemoryRegion *mr = NULL;
+
+    if (!dev->vdev->shmem_list ||
+        dev->vdev->n_shmem_regions <= vu_mmap->shmid) {
+        error_report("Device only has %d VIRTIO Shared Memory Regions. "
+                     "Requested ID: %d",
+                     dev->vdev->n_shmem_regions, vu_mmap->shmid);
+        return -EFAULT;
+    }
+
+    mr = &dev->vdev->shmem_list[vu_mmap->shmid];
+
+    if (!mr) {
+        error_report("VIRTIO Shared Memory Region at "
+                     "ID %d unitialized", vu_mmap->shmid);
+        return -EFAULT;
+    }
+
+    if ((vu_mmap->shm_offset + vu_mmap->len) < vu_mmap->len ||
+        (vu_mmap->shm_offset + vu_mmap->len) > mr->size) {
+        error_report("Bad offset/len for mmap %" PRIx64 "+%" PRIx64,
+                     vu_mmap->shm_offset, vu_mmap->len);
+        return -EFAULT;
+    }
+
+    void *shmem_ptr = memory_region_get_ram_ptr(mr);
+
+    addr = mmap(shmem_ptr + vu_mmap->shm_offset, vu_mmap->len,
+                PROT_NONE, MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, -1, 0);
+
+    if (addr == MAP_FAILED) {
+        error_report("Failed to unmap memory");
+        return -EFAULT;
+    }
+
+    return 0;
+}
+
 static void close_backend_channel(struct vhost_user *u)
 {
     g_source_destroy(u->backend_src);
@@ -1816,6 +1931,13 @@  static gboolean backend_read(QIOChannel *ioc, GIOCondition condition,
         ret = vhost_user_backend_handle_shared_object_lookup(dev->opaque, ioc,
                                                              &hdr, &payload);
         break;
+    case VHOST_USER_BACKEND_SHMEM_MAP:
+        ret = vhost_user_backend_handle_shmem_map(dev, &payload.mmap,
+                                                  fd ? fd[0] : -1);
+        break;
+    case VHOST_USER_BACKEND_SHMEM_UNMAP:
+        ret = vhost_user_backend_handle_shmem_unmap(dev, &payload.mmap);
+        break;
     default:
         error_report("Received unexpected msg type: %d.", hdr.request);
         ret = -EINVAL;
diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 893a072c9d..9f2da5b11e 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -2856,6 +2856,16 @@  int virtio_save(VirtIODevice *vdev, QEMUFile *f)
     return vmstate_save_state(f, &vmstate_virtio, vdev, NULL);
 }
 
+MemoryRegion *virtio_new_shmem_region(VirtIODevice *vdev)
+{
+    MemoryRegion *mr = g_new0(MemoryRegion, 1);
+    ++vdev->n_shmem_regions;
+    vdev->shmem_list = g_renew(MemoryRegion, vdev->shmem_list,
+                               vdev->n_shmem_regions);
+    vdev->shmem_list[vdev->n_shmem_regions - 1] = *mr;
+    return mr;
+}
+
 /* A wrapper for use as a VMState .put function */
 static int virtio_device_put(QEMUFile *f, void *opaque, size_t size,
                               const VMStateField *field, JSONWriter *vmdesc)
@@ -3264,6 +3274,8 @@  void virtio_init(VirtIODevice *vdev, uint16_t device_id, size_t config_size)
             virtio_vmstate_change, vdev);
     vdev->device_endian = virtio_default_endian();
     vdev->use_guest_notifier_mask = true;
+    vdev->shmem_list = NULL;
+    vdev->n_shmem_regions = 0;
 }
 
 /*
diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index 7d5ffdc145..16d598aadc 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -165,6 +165,9 @@  struct VirtIODevice
      */
     EventNotifier config_notifier;
     bool device_iotlb_enabled;
+    /* Shared memory region for vhost-user mappings. */
+    MemoryRegion *shmem_list;
+    int n_shmem_regions;
 };
 
 struct VirtioDeviceClass {
@@ -280,6 +283,8 @@  void virtio_notify(VirtIODevice *vdev, VirtQueue *vq);
 
 int virtio_save(VirtIODevice *vdev, QEMUFile *f);
 
+MemoryRegion *virtio_new_shmem_region(VirtIODevice *vdev);
+
 extern const VMStateInfo virtio_vmstate_info;
 
 #define VMSTATE_VIRTIO_DEVICE \
diff --git a/subprojects/libvhost-user/libvhost-user.c b/subprojects/libvhost-user/libvhost-user.c
index a879149fef..28556d183a 100644
--- a/subprojects/libvhost-user/libvhost-user.c
+++ b/subprojects/libvhost-user/libvhost-user.c
@@ -1586,6 +1586,71 @@  vu_rm_shared_object(VuDev *dev, unsigned char uuid[UUID_LEN])
     return vu_send_message(dev, &msg);
 }
 
+bool
+vu_shmem_map(VuDev *dev, uint8_t shmid, uint64_t fd_offset,
+             uint64_t shm_offset, uint64_t len, uint64_t flags)
+{
+    bool result = false;
+    VhostUserMsg msg_reply;
+    VhostUserMsg vmsg = {
+        .request = VHOST_USER_BACKEND_SHMEM_MAP,
+        .size = sizeof(vmsg.payload.mmap),
+        .flags = VHOST_USER_VERSION,
+        .payload.mmap = {
+            .shmid = shmid,
+            .fd_offset = fd_offset,
+            .shm_offset = shm_offset,
+            .len = len,
+            .flags = flags,
+        },
+    };
+
+    if (vu_has_protocol_feature(dev, VHOST_USER_PROTOCOL_F_REPLY_ACK)) {
+        vmsg.flags |= VHOST_USER_NEED_REPLY_MASK;
+    }
+
+    pthread_mutex_lock(&dev->backend_mutex);
+    if (!vu_message_write(dev, dev->backend_fd, &vmsg)) {
+        pthread_mutex_unlock(&dev->backend_mutex);
+        return false;
+    }
+
+    /* Also unlocks the backend_mutex */
+    return vu_process_message_reply(dev, &vmsg);
+}
+
+bool
+vu_shmem_unmap(VuDev *dev, uint8_t shmid, uint64_t fd_offset,
+               uint64_t shm_offset, uint64_t len)
+{
+    bool result = false;
+    VhostUserMsg msg_reply;
+    VhostUserMsg vmsg = {
+        .request = VHOST_USER_BACKEND_SHMEM_UNMAP,
+        .size = sizeof(vmsg.payload.mmap),
+        .flags = VHOST_USER_VERSION,
+        .payload.mmap = {
+            .shmid = shmid,
+            .fd_offset = fd_offset,
+            .shm_offset = shm_offset,
+            .len = len,
+        },
+    };
+
+    if (vu_has_protocol_feature(dev, VHOST_USER_PROTOCOL_F_REPLY_ACK)) {
+        vmsg.flags |= VHOST_USER_NEED_REPLY_MASK;
+    }
+
+    pthread_mutex_lock(&dev->backend_mutex);
+    if (!vu_message_write(dev, dev->backend_fd, &vmsg)) {
+        pthread_mutex_unlock(&dev->backend_mutex);
+        return false;
+    }
+
+    /* Also unlocks the backend_mutex */
+    return vu_process_message_reply(dev, &vmsg);
+}
+
 static bool
 vu_set_vring_call_exec(VuDev *dev, VhostUserMsg *vmsg)
 {
diff --git a/subprojects/libvhost-user/libvhost-user.h b/subprojects/libvhost-user/libvhost-user.h
index deb40e77b3..7f6c22cc1a 100644
--- a/subprojects/libvhost-user/libvhost-user.h
+++ b/subprojects/libvhost-user/libvhost-user.h
@@ -127,6 +127,8 @@  typedef enum VhostUserBackendRequest {
     VHOST_USER_BACKEND_SHARED_OBJECT_ADD = 6,
     VHOST_USER_BACKEND_SHARED_OBJECT_REMOVE = 7,
     VHOST_USER_BACKEND_SHARED_OBJECT_LOOKUP = 8,
+    VHOST_USER_BACKEND_SHMEM_MAP = 9,
+    VHOST_USER_BACKEND_SHMEM_UNMAP = 10,
     VHOST_USER_BACKEND_MAX
 }  VhostUserBackendRequest;
 
@@ -186,6 +188,24 @@  typedef struct VhostUserShared {
     unsigned char uuid[UUID_LEN];
 } VhostUserShared;
 
+/* For the flags field of VhostUserMMap */
+#define VHOST_USER_FLAG_MAP_R (1u << 0)
+#define VHOST_USER_FLAG_MAP_W (1u << 1)
+
+typedef struct {
+    /* VIRTIO Shared Memory Region ID */
+    uint8_t shmid;
+    uint8_t padding[7];
+    /* File offset */
+    uint64_t fd_offset;
+    /* Offset within the VIRTIO Shared Memory Region */
+    uint64_t shm_offset;
+    /* Size of the mapping */
+    uint64_t len;
+    /* Flags for the mmap operation, from VHOST_USER_FLAG_* */
+    uint64_t flags;
+} VhostUserMMap;
+
 #if defined(_WIN32) && (defined(__x86_64__) || defined(__i386__))
 # define VU_PACKED __attribute__((gcc_struct, packed))
 #else
@@ -214,6 +234,7 @@  typedef struct VhostUserMsg {
         VhostUserVringArea area;
         VhostUserInflight inflight;
         VhostUserShared object;
+        VhostUserMMap mmap;
     } payload;
 
     int fds[VHOST_MEMORY_BASELINE_NREGIONS];
@@ -597,6 +618,38 @@  bool vu_add_shared_object(VuDev *dev, unsigned char uuid[UUID_LEN]);
  */
 bool vu_rm_shared_object(VuDev *dev, unsigned char uuid[UUID_LEN]);
 
+/**
+ * vu_shmem_map:
+ * @dev: a VuDev context
+ * @shmid: VIRTIO Shared Memory Region ID
+ * @fd_offset: File offset
+ * @shm_offset: Offset within the VIRTIO Shared Memory Region
+ * @len: Size of the mapping
+ * @flags: Flags for the mmap operation
+ *
+ * Advertises a new mapping to be made in a given VIRTIO Shared Memory Region.
+ *
+ * Returns: TRUE on success, FALSE on failure.
+ */
+bool vu_shmem_map(VuDev *dev, uint8_t shmid, uint64_t fd_offset,
+                  uint64_t shm_offset, uint64_t len, uint64_t flags);
+
+/**
+ * vu_shmem_map:
+ * @dev: a VuDev context
+ * @shmid: VIRTIO Shared Memory Region ID
+ * @fd_offset: File offset
+ * @shm_offset: Offset within the VIRTIO Shared Memory Region
+ * @len: Size of the mapping
+ *
+ * The front-end un-mmaps a given range in the VIRTIO Shared Memory Region
+ * with the requested `shmid`.
+ *
+ * Returns: TRUE on success, FALSE on failure.
+ */
+bool vu_shmem_unmap(VuDev *dev, uint8_t shmid, uint64_t fd_offset,
+                  uint64_t shm_offset, uint64_t len);
+
 /**
  * vu_queue_set_notification:
  * @dev: a VuDev context