mbox series

[RFC,v2,0/5] vhost-user: Add SHMEM_MAP/UNMAP requests

Message ID 20240628145710.1516121-1-aesteve@redhat.com (mailing list archive)
Headers show
Series vhost-user: Add SHMEM_MAP/UNMAP requests | expand

Message

Albert Esteve June 28, 2024, 2:57 p.m. UTC
Hi all,

v1->v2:
- Corrected typos and clarifications from
  first review
- Added SHMEM_CONFIG frontend request to
  query VIRTIO shared memory regions from
  backends
- vhost-user-device to use SHMEM_CONFIG
  to request and initialise regions
- Added MEM_READ/WRITE backend requests
  in case address translation fails
  accessing VIRTIO Shared Memory Regions
  with MMAPs

This is an update of my attempt to have
backends support dynamic fd mapping into VIRTIO
Shared Memory Regions. After the first review
I have added more commits and new messages
to the vhost-user protocol.
However, I still have some doubts as to
how will this work, specially regarding
the MEM_READ and MEM_WRITE commands.
Thus, I am still looking for feedback,
to ensure that I am going in the right
direction with the implementation.

The usecase for this patch is, e.g., to support
vhost-user-gpu RESOURCE_BLOB operations,
or DAX Window request for virtio-fs. In
general, any operation where a backend
need to request the frontend to mmap an
fd into a VIRTIO Shared Memory Region,
so that the guest can then access it.

After receiving the SHMEM_MAP/UNMAP request,
the frontend will perform the mmap with the
instructed parameters (i.e., shmid, shm_offset,
fd_offset, fd, lenght).

As there are already a couple devices
that could benefit of such a feature,
and more could require it in the future,
the goal is to make the implementation
generic.

To that end, the VIRTIO Shared Memory
Region list is declared in the `VirtIODevice`
struct.

This patch also includes:
SHMEM_CONFIG frontend request that is
specifically meant to allow generic
vhost-user-device frontend to be able to
query VIRTIO Shared Memory settings from the
backend (as this device is generic and agnostic
of the actual backend configuration).

Finally, MEM_READ/WRITE backend requests are
added to deal with a potential issue when having
any backend sharing a descriptor that references
a mapping to another backend. The first
backend will not be able to see these
mappings. So these requests are a fallback
for vhost-user memory translation fails.

Albert Esteve (5):
  vhost-user: Add VIRTIO Shared Memory map request
  vhost_user: Add frontend command for shmem config
  vhost-user-dev: Add cache BAR
  vhost_user: Add MEM_READ/WRITE backend requests
  vhost_user: Implement mem_read/mem_write handlers

 docs/interop/vhost-user.rst               |  58 ++++++
 hw/virtio/vhost-user-base.c               |  39 +++-
 hw/virtio/vhost-user-device-pci.c         |  37 +++-
 hw/virtio/vhost-user.c                    | 221 ++++++++++++++++++++++
 hw/virtio/virtio.c                        |  12 ++
 include/hw/virtio/vhost-backend.h         |   6 +
 include/hw/virtio/vhost-user.h            |   1 +
 include/hw/virtio/virtio.h                |   5 +
 subprojects/libvhost-user/libvhost-user.c | 149 +++++++++++++++
 subprojects/libvhost-user/libvhost-user.h |  91 +++++++++
 10 files changed, 614 insertions(+), 5 deletions(-)

Comments

Stefan Hajnoczi July 11, 2024, 9:01 a.m. UTC | #1
On Fri, Jun 28, 2024 at 04:57:05PM +0200, Albert Esteve wrote:
> Hi all,
> 
> v1->v2:
> - Corrected typos and clarifications from
>   first review
> - Added SHMEM_CONFIG frontend request to
>   query VIRTIO shared memory regions from
>   backends
> - vhost-user-device to use SHMEM_CONFIG
>   to request and initialise regions
> - Added MEM_READ/WRITE backend requests
>   in case address translation fails
>   accessing VIRTIO Shared Memory Regions
>   with MMAPs

Hi Albert,
I will be offline next week. I've posted comments.

I think the hard part will be adjusting vhost-user backend code to make
use of MEM_READ/MEM_WRITE when address translation fails. Ideally every
guest memory access (including vring accesses) should fall back to
MEM_READ/MEM_WRITE.

A good test for MEM_READ/MEM_WRITE is to completely skip setting the
memory table from the frontend and fall back for every guest memory
access. If the vhost-user backend still works without a memory table
then you know MEM_READ/MEM_WRITE is working too.

The vhost-user spec should probably contain a comment explaining that
MEM_READ/MEM_WRITE may be necessary when other device backends use
SHMEM_MAP due to the incomplete memory table that prevents translating
those memory addresses. In other words, if the guest has a device that
uses SHMEM_MAP, then all other vhost-user devices should support
MEM_READ/MEM_WRITE in order to ensure that DMA works with Shared Memory
Regions.

Stefan

> 
> This is an update of my attempt to have
> backends support dynamic fd mapping into VIRTIO
> Shared Memory Regions. After the first review
> I have added more commits and new messages
> to the vhost-user protocol.
> However, I still have some doubts as to
> how will this work, specially regarding
> the MEM_READ and MEM_WRITE commands.
> Thus, I am still looking for feedback,
> to ensure that I am going in the right
> direction with the implementation.
> 
> The usecase for this patch is, e.g., to support
> vhost-user-gpu RESOURCE_BLOB operations,
> or DAX Window request for virtio-fs. In
> general, any operation where a backend
> need to request the frontend to mmap an
> fd into a VIRTIO Shared Memory Region,
> so that the guest can then access it.
> 
> After receiving the SHMEM_MAP/UNMAP request,
> the frontend will perform the mmap with the
> instructed parameters (i.e., shmid, shm_offset,
> fd_offset, fd, lenght).
> 
> As there are already a couple devices
> that could benefit of such a feature,
> and more could require it in the future,
> the goal is to make the implementation
> generic.
> 
> To that end, the VIRTIO Shared Memory
> Region list is declared in the `VirtIODevice`
> struct.
> 
> This patch also includes:
> SHMEM_CONFIG frontend request that is
> specifically meant to allow generic
> vhost-user-device frontend to be able to
> query VIRTIO Shared Memory settings from the
> backend (as this device is generic and agnostic
> of the actual backend configuration).
> 
> Finally, MEM_READ/WRITE backend requests are
> added to deal with a potential issue when having
> any backend sharing a descriptor that references
> a mapping to another backend. The first
> backend will not be able to see these
> mappings. So these requests are a fallback
> for vhost-user memory translation fails.
> 
> Albert Esteve (5):
>   vhost-user: Add VIRTIO Shared Memory map request
>   vhost_user: Add frontend command for shmem config
>   vhost-user-dev: Add cache BAR
>   vhost_user: Add MEM_READ/WRITE backend requests
>   vhost_user: Implement mem_read/mem_write handlers
> 
>  docs/interop/vhost-user.rst               |  58 ++++++
>  hw/virtio/vhost-user-base.c               |  39 +++-
>  hw/virtio/vhost-user-device-pci.c         |  37 +++-
>  hw/virtio/vhost-user.c                    | 221 ++++++++++++++++++++++
>  hw/virtio/virtio.c                        |  12 ++
>  include/hw/virtio/vhost-backend.h         |   6 +
>  include/hw/virtio/vhost-user.h            |   1 +
>  include/hw/virtio/virtio.h                |   5 +
>  subprojects/libvhost-user/libvhost-user.c | 149 +++++++++++++++
>  subprojects/libvhost-user/libvhost-user.h |  91 +++++++++
>  10 files changed, 614 insertions(+), 5 deletions(-)
> 
> -- 
> 2.45.2
>
Alyssa Ross July 11, 2024, 10:56 a.m. UTC | #2
Adding David Stevens, who implemented SHMEM_MAP and SHMEM_UNMAP in
crosvm a couple of years ago.

David, I'd be particularly interested for your thoughts on the MEM_READ
and MEM_WRITE commands, since as far as I know crosvm doesn't implement
anything like that.  The discussion leading to those being added starts
here:

https://lore.kernel.org/qemu-devel/20240604185416.GB90471@fedora.redhat.com/

It would be great if this could be standardised between QEMU and crosvm
(and therefore have a clearer path toward being implemented in other VMMs)!

Albert Esteve <aesteve@redhat.com> writes:

> Hi all,
>
> v1->v2:
> - Corrected typos and clarifications from
>   first review
> - Added SHMEM_CONFIG frontend request to
>   query VIRTIO shared memory regions from
>   backends
> - vhost-user-device to use SHMEM_CONFIG
>   to request and initialise regions
> - Added MEM_READ/WRITE backend requests
>   in case address translation fails
>   accessing VIRTIO Shared Memory Regions
>   with MMAPs
>
> This is an update of my attempt to have
> backends support dynamic fd mapping into VIRTIO
> Shared Memory Regions. After the first review
> I have added more commits and new messages
> to the vhost-user protocol.
> However, I still have some doubts as to
> how will this work, specially regarding
> the MEM_READ and MEM_WRITE commands.
> Thus, I am still looking for feedback,
> to ensure that I am going in the right
> direction with the implementation.
>
> The usecase for this patch is, e.g., to support
> vhost-user-gpu RESOURCE_BLOB operations,
> or DAX Window request for virtio-fs. In
> general, any operation where a backend
> need to request the frontend to mmap an
> fd into a VIRTIO Shared Memory Region,
> so that the guest can then access it.
>
> After receiving the SHMEM_MAP/UNMAP request,
> the frontend will perform the mmap with the
> instructed parameters (i.e., shmid, shm_offset,
> fd_offset, fd, lenght).
>
> As there are already a couple devices
> that could benefit of such a feature,
> and more could require it in the future,
> the goal is to make the implementation
> generic.
>
> To that end, the VIRTIO Shared Memory
> Region list is declared in the `VirtIODevice`
> struct.
>
> This patch also includes:
> SHMEM_CONFIG frontend request that is
> specifically meant to allow generic
> vhost-user-device frontend to be able to
> query VIRTIO Shared Memory settings from the
> backend (as this device is generic and agnostic
> of the actual backend configuration).
>
> Finally, MEM_READ/WRITE backend requests are
> added to deal with a potential issue when having
> any backend sharing a descriptor that references
> a mapping to another backend. The first
> backend will not be able to see these
> mappings. So these requests are a fallback
> for vhost-user memory translation fails.
>
> Albert Esteve (5):
>   vhost-user: Add VIRTIO Shared Memory map request
>   vhost_user: Add frontend command for shmem config
>   vhost-user-dev: Add cache BAR
>   vhost_user: Add MEM_READ/WRITE backend requests
>   vhost_user: Implement mem_read/mem_write handlers
>
>  docs/interop/vhost-user.rst               |  58 ++++++
>  hw/virtio/vhost-user-base.c               |  39 +++-
>  hw/virtio/vhost-user-device-pci.c         |  37 +++-
>  hw/virtio/vhost-user.c                    | 221 ++++++++++++++++++++++
>  hw/virtio/virtio.c                        |  12 ++
>  include/hw/virtio/vhost-backend.h         |   6 +
>  include/hw/virtio/vhost-user.h            |   1 +
>  include/hw/virtio/virtio.h                |   5 +
>  subprojects/libvhost-user/libvhost-user.c | 149 +++++++++++++++
>  subprojects/libvhost-user/libvhost-user.h |  91 +++++++++
>  10 files changed, 614 insertions(+), 5 deletions(-)
>
> -- 
> 2.45.2
David Stevens July 12, 2024, 2:06 a.m. UTC | #3
On Thu, Jul 11, 2024 at 7:56 PM Alyssa Ross <hi@alyssa.is> wrote:
>
> Adding David Stevens, who implemented SHMEM_MAP and SHMEM_UNMAP in
> crosvm a couple of years ago.
>
> David, I'd be particularly interested for your thoughts on the MEM_READ
> and MEM_WRITE commands, since as far as I know crosvm doesn't implement
> anything like that.  The discussion leading to those being added starts
> here:
>
> https://lore.kernel.org/qemu-devel/20240604185416.GB90471@fedora.redhat.com/
>
> It would be great if this could be standardised between QEMU and crosvm
> (and therefore have a clearer path toward being implemented in other VMMs)!

Setting aside vhost-user for a moment, the DAX example given by Stefan
won't work in crosvm today.

Is universal access to virtio shared memory regions actually mandated
by the virtio spec? Copying from virtiofs DAX to virtiofs sharing
seems reasonable enough, but what about virtio-pmem to virtio-blk?
What about screenshotting a framebuffer in virtio-gpu shared memory to
virtio-scsi? I guess with some plumbing in the VMM, it's solvable in a
virtualized environment. But what about when you have real hardware
that speaks virtio involved? That's outside my wheelhouse, but it
doesn't seem like that would be easy to solve.

For what it's worth, my interpretation of the target scenario:

> Other backends don't see these mappings. If the guest submits a vring
> descriptor referencing a mapping to another backend, then that backend
> won't be able to access this memory

is that it's omitting how the implementation is reconciled with
section 2.10.1 of v1.3 of the virtio spec, which states that:

> References into shared memory regions are represented as offsets from
> the beginning of the region instead of absolute memory addresses. Offsets
> are used both for references between structures stored within shared
> memory and for requests placed in virtqueues that refer to shared memory.

My interpretation of that statement is that putting raw guest physical
addresses corresponding to virtio shared memory regions into a vring
is a driver spec violation.

-David
Michael S. Tsirkin July 12, 2024, 5:47 a.m. UTC | #4
On Fri, Jul 12, 2024 at 11:06:49AM +0900, David Stevens wrote:
> On Thu, Jul 11, 2024 at 7:56 PM Alyssa Ross <hi@alyssa.is> wrote:
> >
> > Adding David Stevens, who implemented SHMEM_MAP and SHMEM_UNMAP in
> > crosvm a couple of years ago.
> >
> > David, I'd be particularly interested for your thoughts on the MEM_READ
> > and MEM_WRITE commands, since as far as I know crosvm doesn't implement
> > anything like that.  The discussion leading to those being added starts
> > here:
> >
> > https://lore.kernel.org/qemu-devel/20240604185416.GB90471@fedora.redhat.com/
> >
> > It would be great if this could be standardised between QEMU and crosvm
> > (and therefore have a clearer path toward being implemented in other VMMs)!
> 
> Setting aside vhost-user for a moment, the DAX example given by Stefan
> won't work in crosvm today.
> 
> Is universal access to virtio shared memory regions actually mandated
> by the virtio spec? Copying from virtiofs DAX to virtiofs sharing
> seems reasonable enough, but what about virtio-pmem to virtio-blk?
> What about screenshotting a framebuffer in virtio-gpu shared memory to
> virtio-scsi? I guess with some plumbing in the VMM, it's solvable in a
> virtualized environment. But what about when you have real hardware
> that speaks virtio involved? That's outside my wheelhouse, but it
> doesn't seem like that would be easy to solve.

Yes, it can work for physical devices if allowed by host configuration.
E.g. VFIO supports that I think. Don't think VDPA does.

> For what it's worth, my interpretation of the target scenario:
> 
> > Other backends don't see these mappings. If the guest submits a vring
> > descriptor referencing a mapping to another backend, then that backend
> > won't be able to access this memory
> 
> is that it's omitting how the implementation is reconciled with
> section 2.10.1 of v1.3 of the virtio spec, which states that:
> 
> > References into shared memory regions are represented as offsets from
> > the beginning of the region instead of absolute memory addresses. Offsets
> > are used both for references between structures stored within shared
> > memory and for requests placed in virtqueues that refer to shared memory.
> 
> My interpretation of that statement is that putting raw guest physical
> addresses corresponding to virtio shared memory regions into a vring
> is a driver spec violation.
> 
> -David

This really applies within device I think. Should be clarified ...
Jason Wang July 15, 2024, 2:30 a.m. UTC | #5
On Fri, Jul 12, 2024 at 1:48 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Fri, Jul 12, 2024 at 11:06:49AM +0900, David Stevens wrote:
> > On Thu, Jul 11, 2024 at 7:56 PM Alyssa Ross <hi@alyssa.is> wrote:
> > >
> > > Adding David Stevens, who implemented SHMEM_MAP and SHMEM_UNMAP in
> > > crosvm a couple of years ago.
> > >
> > > David, I'd be particularly interested for your thoughts on the MEM_READ
> > > and MEM_WRITE commands, since as far as I know crosvm doesn't implement
> > > anything like that.  The discussion leading to those being added starts
> > > here:
> > >
> > > https://lore.kernel.org/qemu-devel/20240604185416.GB90471@fedora.redhat.com/
> > >
> > > It would be great if this could be standardised between QEMU and crosvm
> > > (and therefore have a clearer path toward being implemented in other VMMs)!
> >
> > Setting aside vhost-user for a moment, the DAX example given by Stefan
> > won't work in crosvm today.
> >
> > Is universal access to virtio shared memory regions actually mandated
> > by the virtio spec? Copying from virtiofs DAX to virtiofs sharing
> > seems reasonable enough, but what about virtio-pmem to virtio-blk?
> > What about screenshotting a framebuffer in virtio-gpu shared memory to
> > virtio-scsi? I guess with some plumbing in the VMM, it's solvable in a
> > virtualized environment. But what about when you have real hardware
> > that speaks virtio involved? That's outside my wheelhouse, but it
> > doesn't seem like that would be easy to solve.
>
> Yes, it can work for physical devices if allowed by host configuration.
> E.g. VFIO supports that I think. Don't think VDPA does.
>

I guess you meant iommufd support here?

Thanks
David Stevens July 16, 2024, 1:21 a.m. UTC | #6
On Fri, Jul 12, 2024 at 2:47 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Fri, Jul 12, 2024 at 11:06:49AM +0900, David Stevens wrote:
> > On Thu, Jul 11, 2024 at 7:56 PM Alyssa Ross <hi@alyssa.is> wrote:
> > >
> > > Adding David Stevens, who implemented SHMEM_MAP and SHMEM_UNMAP in
> > > crosvm a couple of years ago.
> > >
> > > David, I'd be particularly interested for your thoughts on the MEM_READ
> > > and MEM_WRITE commands, since as far as I know crosvm doesn't implement
> > > anything like that.  The discussion leading to those being added starts
> > > here:
> > >
> > > https://lore.kernel.org/qemu-devel/20240604185416.GB90471@fedora.redhat.com/
> > >
> > > It would be great if this could be standardised between QEMU and crosvm
> > > (and therefore have a clearer path toward being implemented in other VMMs)!
> >
> > Setting aside vhost-user for a moment, the DAX example given by Stefan
> > won't work in crosvm today.
> >
> > Is universal access to virtio shared memory regions actually mandated
> > by the virtio spec? Copying from virtiofs DAX to virtiofs sharing
> > seems reasonable enough, but what about virtio-pmem to virtio-blk?
> > What about screenshotting a framebuffer in virtio-gpu shared memory to
> > virtio-scsi? I guess with some plumbing in the VMM, it's solvable in a
> > virtualized environment. But what about when you have real hardware
> > that speaks virtio involved? That's outside my wheelhouse, but it
> > doesn't seem like that would be easy to solve.
>
> Yes, it can work for physical devices if allowed by host configuration.
> E.g. VFIO supports that I think. Don't think VDPA does.

I'm sure it can work, but that sounds more like a SHOULD (MAY?),
rather than a MUST.

> > For what it's worth, my interpretation of the target scenario:
> >
> > > Other backends don't see these mappings. If the guest submits a vring
> > > descriptor referencing a mapping to another backend, then that backend
> > > won't be able to access this memory
> >
> > is that it's omitting how the implementation is reconciled with
> > section 2.10.1 of v1.3 of the virtio spec, which states that:
> >
> > > References into shared memory regions are represented as offsets from
> > > the beginning of the region instead of absolute memory addresses. Offsets
> > > are used both for references between structures stored within shared
> > > memory and for requests placed in virtqueues that refer to shared memory.
> >
> > My interpretation of that statement is that putting raw guest physical
> > addresses corresponding to virtio shared memory regions into a vring
> > is a driver spec violation.
> >
> > -David
>
> This really applies within device I think. Should be clarified ...

You mean that a virtio device can use absolute memory addresses for
other devices' shared memory regions, but it can't use absolute memory
addresses for its own shared memory regions? That's a rather strange
requirement. Or is the statement simply giving an addressing strategy
that device type specifications are free to ignore?

-David
Albert Esteve Sept. 3, 2024, 8:42 a.m. UTC | #7
Hello all,

Sorry, I have been a bit disconnected from this thread as I was on
vacations and then had to switch tasks for a while.

I will try to go through all comments and address them for the first
non-RFC drop of this patch series.

But I was discussing with some colleagues on this. So turns out rust-vmm's
vhost-user-gpu will potentially use
this soon, and a rust-vmm/vhost patch have been already posted:
https://github.com/rust-vmm/vhost/pull/251.
So I think it may make sense to:
1. Split the vhost-user documentation patch once settled. Since it is taken
as the official spec,
    having it upstreamed independently of the implementation will benefit
other projects to
    work/integrate their own code.
2. Split READ_/WRITE_MEM messages from SHMEM_MAP/_UNMAP patches.
    If I remember correctly, this addresses a virtio-fs specific issue,
that will not
    impact either virtio-gpu nor virtio-media, or any other. So it may make
sense
    to separate them so that one does not stall the other. I will try to
have both
    integrated in the mid term.

WDYT?

BR,
Albert.

On Tue, Jul 16, 2024 at 3:21 AM David Stevens <stevensd@chromium.org> wrote:

> On Fri, Jul 12, 2024 at 2:47 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Fri, Jul 12, 2024 at 11:06:49AM +0900, David Stevens wrote:
> > > On Thu, Jul 11, 2024 at 7:56 PM Alyssa Ross <hi@alyssa.is> wrote:
> > > >
> > > > Adding David Stevens, who implemented SHMEM_MAP and SHMEM_UNMAP in
> > > > crosvm a couple of years ago.
> > > >
> > > > David, I'd be particularly interested for your thoughts on the
> MEM_READ
> > > > and MEM_WRITE commands, since as far as I know crosvm doesn't
> implement
> > > > anything like that.  The discussion leading to those being added
> starts
> > > > here:
> > > >
> > > >
> https://lore.kernel.org/qemu-devel/20240604185416.GB90471@fedora.redhat.com/
> > > >
> > > > It would be great if this could be standardised between QEMU and
> crosvm
> > > > (and therefore have a clearer path toward being implemented in other
> VMMs)!
> > >
> > > Setting aside vhost-user for a moment, the DAX example given by Stefan
> > > won't work in crosvm today.
> > >
> > > Is universal access to virtio shared memory regions actually mandated
> > > by the virtio spec? Copying from virtiofs DAX to virtiofs sharing
> > > seems reasonable enough, but what about virtio-pmem to virtio-blk?
> > > What about screenshotting a framebuffer in virtio-gpu shared memory to
> > > virtio-scsi? I guess with some plumbing in the VMM, it's solvable in a
> > > virtualized environment. But what about when you have real hardware
> > > that speaks virtio involved? That's outside my wheelhouse, but it
> > > doesn't seem like that would be easy to solve.
> >
> > Yes, it can work for physical devices if allowed by host configuration.
> > E.g. VFIO supports that I think. Don't think VDPA does.
>
> I'm sure it can work, but that sounds more like a SHOULD (MAY?),
> rather than a MUST.
>
> > > For what it's worth, my interpretation of the target scenario:
> > >
> > > > Other backends don't see these mappings. If the guest submits a vring
> > > > descriptor referencing a mapping to another backend, then that
> backend
> > > > won't be able to access this memory
> > >
> > > is that it's omitting how the implementation is reconciled with
> > > section 2.10.1 of v1.3 of the virtio spec, which states that:
> > >
> > > > References into shared memory regions are represented as offsets from
> > > > the beginning of the region instead of absolute memory addresses.
> Offsets
> > > > are used both for references between structures stored within shared
> > > > memory and for requests placed in virtqueues that refer to shared
> memory.
> > >
> > > My interpretation of that statement is that putting raw guest physical
> > > addresses corresponding to virtio shared memory regions into a vring
> > > is a driver spec violation.
> > >
> > > -David
> >
> > This really applies within device I think. Should be clarified ...
>
> You mean that a virtio device can use absolute memory addresses for
> other devices' shared memory regions, but it can't use absolute memory
> addresses for its own shared memory regions? That's a rather strange
> requirement. Or is the statement simply giving an addressing strategy
> that device type specifications are free to ignore?
>
> -David
>
>
Stefan Hajnoczi Sept. 5, 2024, 3:56 p.m. UTC | #8
On Tue, Jul 16, 2024 at 10:21:35AM +0900, David Stevens wrote:
> On Fri, Jul 12, 2024 at 2:47 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Fri, Jul 12, 2024 at 11:06:49AM +0900, David Stevens wrote:
> > > On Thu, Jul 11, 2024 at 7:56 PM Alyssa Ross <hi@alyssa.is> wrote:
> > > >
> > > > Adding David Stevens, who implemented SHMEM_MAP and SHMEM_UNMAP in
> > > > crosvm a couple of years ago.
> > > >
> > > > David, I'd be particularly interested for your thoughts on the MEM_READ
> > > > and MEM_WRITE commands, since as far as I know crosvm doesn't implement
> > > > anything like that.  The discussion leading to those being added starts
> > > > here:
> > > >
> > > > https://lore.kernel.org/qemu-devel/20240604185416.GB90471@fedora.redhat.com/
> > > >
> > > > It would be great if this could be standardised between QEMU and crosvm
> > > > (and therefore have a clearer path toward being implemented in other VMMs)!
> > >
> > > Setting aside vhost-user for a moment, the DAX example given by Stefan
> > > won't work in crosvm today.
> > >
> > > Is universal access to virtio shared memory regions actually mandated
> > > by the virtio spec? Copying from virtiofs DAX to virtiofs sharing
> > > seems reasonable enough, but what about virtio-pmem to virtio-blk?
> > > What about screenshotting a framebuffer in virtio-gpu shared memory to
> > > virtio-scsi? I guess with some plumbing in the VMM, it's solvable in a
> > > virtualized environment. But what about when you have real hardware
> > > that speaks virtio involved? That's outside my wheelhouse, but it
> > > doesn't seem like that would be easy to solve.
> >
> > Yes, it can work for physical devices if allowed by host configuration.
> > E.g. VFIO supports that I think. Don't think VDPA does.
> 
> I'm sure it can work, but that sounds more like a SHOULD (MAY?),
> rather than a MUST.
> 
> > > For what it's worth, my interpretation of the target scenario:
> > >
> > > > Other backends don't see these mappings. If the guest submits a vring
> > > > descriptor referencing a mapping to another backend, then that backend
> > > > won't be able to access this memory
> > >
> > > is that it's omitting how the implementation is reconciled with
> > > section 2.10.1 of v1.3 of the virtio spec, which states that:
> > >
> > > > References into shared memory regions are represented as offsets from
> > > > the beginning of the region instead of absolute memory addresses. Offsets
> > > > are used both for references between structures stored within shared
> > > > memory and for requests placed in virtqueues that refer to shared memory.
> > >
> > > My interpretation of that statement is that putting raw guest physical
> > > addresses corresponding to virtio shared memory regions into a vring
> > > is a driver spec violation.
> > >
> > > -David
> >
> > This really applies within device I think. Should be clarified ...
> 
> You mean that a virtio device can use absolute memory addresses for
> other devices' shared memory regions, but it can't use absolute memory
> addresses for its own shared memory regions? That's a rather strange
> requirement. Or is the statement simply giving an addressing strategy
> that device type specifications are free to ignore?

My recollection of the intent behind the quoted section is:

1. Structures in shared memory that point to shared memory must used
   relative offsets instead of absolute physical addresses.
2. Virtqueue requests that refer to shared memory (e.g. map this page
   from virtiofs file to this location in shared memory) must use
   relative offsets instead of absolute physical addresses.

In other words, shared memory must be relocatable. Don't assume Shared
Memory Regions have an absolute guest physical address. This makes
device implementations independent of the guest physical memory layout
and might also help when Shared Memory Regions are exposed to guest
user-space where the guest physical memory layout isn't known.

Stefan
Stefan Hajnoczi Sept. 5, 2024, 4:39 p.m. UTC | #9
On Tue, Sep 03, 2024 at 10:42:34AM +0200, Albert Esteve wrote:
> Hello all,
> 
> Sorry, I have been a bit disconnected from this thread as I was on
> vacations and then had to switch tasks for a while.
> 
> I will try to go through all comments and address them for the first
> non-RFC drop of this patch series.
> 
> But I was discussing with some colleagues on this. So turns out rust-vmm's
> vhost-user-gpu will potentially use
> this soon, and a rust-vmm/vhost patch have been already posted:
> https://github.com/rust-vmm/vhost/pull/251.
> So I think it may make sense to:
> 1. Split the vhost-user documentation patch once settled. Since it is taken
> as the official spec,
>     having it upstreamed independently of the implementation will benefit
> other projects to
>     work/integrate their own code.
> 2. Split READ_/WRITE_MEM messages from SHMEM_MAP/_UNMAP patches.
>     If I remember correctly, this addresses a virtio-fs specific issue,
> that will not
>     impact either virtio-gpu nor virtio-media, or any other.

This is an architectural issue that arises from exposing VIRTIO Shared
Memory Regions in vhost-user. It was first seen with Linux virtiofs but
it could happen with other devices and/or guest operating systems.

Any VIRTIO Shared Memory Region that can be mmapped into Linux userspace
may trigger this issue. Userspace may write(2) to an O_DIRECT file with
the mmap as the source. The vhost-user-blk device will not be able to
access the source device's VIRTIO Shared Memory Region and will fail.

> So it may make
> sense
>     to separate them so that one does not stall the other. I will try to
> have both
>     integrated in the mid term.

If READ_/WRITE_MEM is a pain to implement (I think it is in the
vhost-user back-end, even though I've been a proponent of it), then
another way to deal with this issue is to specify that upon receiving
MAP/UNMAP messages, the vhost-user front-end must update the vhost-user
memory tables of all other vhost-user devices. That way vhost-user
devices will be able to access VIRTIO Shared Memory Regions mapped by
other devices.

Implementing this in QEMU should be much easier than implementing
READ_/WRITE_MEM support in device back-ends.

This will be slow and scale poorly but performance is only a problem for
devices that frequently MAP/UNMAP like virtiofs. Will virtio-gpu and
virtio-media use MAP/UNMAP often at runtime? They might be able to get
away with this simple solution.

I'd be happy with that. If someone wants to make virtiofs DAX faster,
they can implement READ/WRITE_MEM or another solution later, but let's
at least make things correct from the start.

Stefan

> 
> WDYT?
> 
> BR,
> Albert.
> 
> On Tue, Jul 16, 2024 at 3:21 AM David Stevens <stevensd@chromium.org> wrote:
> 
> > On Fri, Jul 12, 2024 at 2:47 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > On Fri, Jul 12, 2024 at 11:06:49AM +0900, David Stevens wrote:
> > > > On Thu, Jul 11, 2024 at 7:56 PM Alyssa Ross <hi@alyssa.is> wrote:
> > > > >
> > > > > Adding David Stevens, who implemented SHMEM_MAP and SHMEM_UNMAP in
> > > > > crosvm a couple of years ago.
> > > > >
> > > > > David, I'd be particularly interested for your thoughts on the
> > MEM_READ
> > > > > and MEM_WRITE commands, since as far as I know crosvm doesn't
> > implement
> > > > > anything like that.  The discussion leading to those being added
> > starts
> > > > > here:
> > > > >
> > > > >
> > https://lore.kernel.org/qemu-devel/20240604185416.GB90471@fedora.redhat.com/
> > > > >
> > > > > It would be great if this could be standardised between QEMU and
> > crosvm
> > > > > (and therefore have a clearer path toward being implemented in other
> > VMMs)!
> > > >
> > > > Setting aside vhost-user for a moment, the DAX example given by Stefan
> > > > won't work in crosvm today.
> > > >
> > > > Is universal access to virtio shared memory regions actually mandated
> > > > by the virtio spec? Copying from virtiofs DAX to virtiofs sharing
> > > > seems reasonable enough, but what about virtio-pmem to virtio-blk?
> > > > What about screenshotting a framebuffer in virtio-gpu shared memory to
> > > > virtio-scsi? I guess with some plumbing in the VMM, it's solvable in a
> > > > virtualized environment. But what about when you have real hardware
> > > > that speaks virtio involved? That's outside my wheelhouse, but it
> > > > doesn't seem like that would be easy to solve.
> > >
> > > Yes, it can work for physical devices if allowed by host configuration.
> > > E.g. VFIO supports that I think. Don't think VDPA does.
> >
> > I'm sure it can work, but that sounds more like a SHOULD (MAY?),
> > rather than a MUST.
> >
> > > > For what it's worth, my interpretation of the target scenario:
> > > >
> > > > > Other backends don't see these mappings. If the guest submits a vring
> > > > > descriptor referencing a mapping to another backend, then that
> > backend
> > > > > won't be able to access this memory
> > > >
> > > > is that it's omitting how the implementation is reconciled with
> > > > section 2.10.1 of v1.3 of the virtio spec, which states that:
> > > >
> > > > > References into shared memory regions are represented as offsets from
> > > > > the beginning of the region instead of absolute memory addresses.
> > Offsets
> > > > > are used both for references between structures stored within shared
> > > > > memory and for requests placed in virtqueues that refer to shared
> > memory.
> > > >
> > > > My interpretation of that statement is that putting raw guest physical
> > > > addresses corresponding to virtio shared memory regions into a vring
> > > > is a driver spec violation.
> > > >
> > > > -David
> > >
> > > This really applies within device I think. Should be clarified ...
> >
> > You mean that a virtio device can use absolute memory addresses for
> > other devices' shared memory regions, but it can't use absolute memory
> > addresses for its own shared memory regions? That's a rather strange
> > requirement. Or is the statement simply giving an addressing strategy
> > that device type specifications are free to ignore?
> >
> > -David
> >
> >
David Stevens Sept. 6, 2024, 4:18 a.m. UTC | #10
On Fri, Sep 6, 2024 at 12:56 AM Stefan Hajnoczi <stefanha@redhat.com> wrote:
>
> On Tue, Jul 16, 2024 at 10:21:35AM +0900, David Stevens wrote:
> > On Fri, Jul 12, 2024 at 2:47 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > On Fri, Jul 12, 2024 at 11:06:49AM +0900, David Stevens wrote:
> > > > On Thu, Jul 11, 2024 at 7:56 PM Alyssa Ross <hi@alyssa.is> wrote:
> > > > >
> > > > > Adding David Stevens, who implemented SHMEM_MAP and SHMEM_UNMAP in
> > > > > crosvm a couple of years ago.
> > > > >
> > > > > David, I'd be particularly interested for your thoughts on the MEM_READ
> > > > > and MEM_WRITE commands, since as far as I know crosvm doesn't implement
> > > > > anything like that.  The discussion leading to those being added starts
> > > > > here:
> > > > >
> > > > > https://lore.kernel.org/qemu-devel/20240604185416.GB90471@fedora.redhat.com/
> > > > >
> > > > > It would be great if this could be standardised between QEMU and crosvm
> > > > > (and therefore have a clearer path toward being implemented in other VMMs)!
> > > >
> > > > Setting aside vhost-user for a moment, the DAX example given by Stefan
> > > > won't work in crosvm today.
> > > >
> > > > Is universal access to virtio shared memory regions actually mandated
> > > > by the virtio spec? Copying from virtiofs DAX to virtiofs sharing
> > > > seems reasonable enough, but what about virtio-pmem to virtio-blk?
> > > > What about screenshotting a framebuffer in virtio-gpu shared memory to
> > > > virtio-scsi? I guess with some plumbing in the VMM, it's solvable in a
> > > > virtualized environment. But what about when you have real hardware
> > > > that speaks virtio involved? That's outside my wheelhouse, but it
> > > > doesn't seem like that would be easy to solve.
> > >
> > > Yes, it can work for physical devices if allowed by host configuration.
> > > E.g. VFIO supports that I think. Don't think VDPA does.
> >
> > I'm sure it can work, but that sounds more like a SHOULD (MAY?),
> > rather than a MUST.
> >
> > > > For what it's worth, my interpretation of the target scenario:
> > > >
> > > > > Other backends don't see these mappings. If the guest submits a vring
> > > > > descriptor referencing a mapping to another backend, then that backend
> > > > > won't be able to access this memory
> > > >
> > > > is that it's omitting how the implementation is reconciled with
> > > > section 2.10.1 of v1.3 of the virtio spec, which states that:
> > > >
> > > > > References into shared memory regions are represented as offsets from
> > > > > the beginning of the region instead of absolute memory addresses. Offsets
> > > > > are used both for references between structures stored within shared
> > > > > memory and for requests placed in virtqueues that refer to shared memory.
> > > >
> > > > My interpretation of that statement is that putting raw guest physical
> > > > addresses corresponding to virtio shared memory regions into a vring
> > > > is a driver spec violation.
> > > >
> > > > -David
> > >
> > > This really applies within device I think. Should be clarified ...
> >
> > You mean that a virtio device can use absolute memory addresses for
> > other devices' shared memory regions, but it can't use absolute memory
> > addresses for its own shared memory regions? That's a rather strange
> > requirement. Or is the statement simply giving an addressing strategy
> > that device type specifications are free to ignore?
>
> My recollection of the intent behind the quoted section is:
>
> 1. Structures in shared memory that point to shared memory must used
>    relative offsets instead of absolute physical addresses.
> 2. Virtqueue requests that refer to shared memory (e.g. map this page
>    from virtiofs file to this location in shared memory) must use
>    relative offsets instead of absolute physical addresses.
>
> In other words, shared memory must be relocatable. Don't assume Shared
> Memory Regions have an absolute guest physical address. This makes
> device implementations independent of the guest physical memory layout
> and might also help when Shared Memory Regions are exposed to guest
> user-space where the guest physical memory layout isn't known.

Doesn't this discussion contradict the necessity of point 1? If I'm
understanding things correctly, it is valid for virtio device A to
refer to a structure in virtio device B's shared memory region by
absolute guest physical address. At least there is nothing in the spec
about resolving shmid values among different virtio devices, so
absolute guest physical addresses is the only way this sharing can be
done. And if it's valid for a pointer to a structure in a shared
memory region to exist, it's not clear to me why you can't have
pointers within a shared memory region.

It definitely makes sense that setting up a mapping should be done
with offsets. But unless a shared memory region can be dynamically
reallocated at runtime, then it doesn't seem necessary to ban pointers
within a shared memory region.

-David
Albert Esteve Sept. 6, 2024, 7:03 a.m. UTC | #11
On Thu, Sep 5, 2024 at 6:39 PM Stefan Hajnoczi <stefanha@redhat.com> wrote:

> On Tue, Sep 03, 2024 at 10:42:34AM +0200, Albert Esteve wrote:
> > Hello all,
> >
> > Sorry, I have been a bit disconnected from this thread as I was on
> > vacations and then had to switch tasks for a while.
> >
> > I will try to go through all comments and address them for the first
> > non-RFC drop of this patch series.
> >
> > But I was discussing with some colleagues on this. So turns out
> rust-vmm's
> > vhost-user-gpu will potentially use
> > this soon, and a rust-vmm/vhost patch have been already posted:
> > https://github.com/rust-vmm/vhost/pull/251.
> > So I think it may make sense to:
> > 1. Split the vhost-user documentation patch once settled. Since it is
> taken
> > as the official spec,
> >     having it upstreamed independently of the implementation will benefit
> > other projects to
> >     work/integrate their own code.
> > 2. Split READ_/WRITE_MEM messages from SHMEM_MAP/_UNMAP patches.
> >     If I remember correctly, this addresses a virtio-fs specific issue,
> > that will not
> >     impact either virtio-gpu nor virtio-media, or any other.
>
> This is an architectural issue that arises from exposing VIRTIO Shared
> Memory Regions in vhost-user. It was first seen with Linux virtiofs but
> it could happen with other devices and/or guest operating systems.
>
> Any VIRTIO Shared Memory Region that can be mmapped into Linux userspace
> may trigger this issue. Userspace may write(2) to an O_DIRECT file with
> the mmap as the source. The vhost-user-blk device will not be able to
> access the source device's VIRTIO Shared Memory Region and will fail.
>
> > So it may make
> > sense
> >     to separate them so that one does not stall the other. I will try to
> > have both
> >     integrated in the mid term.
>
> If READ_/WRITE_MEM is a pain to implement (I think it is in the
> vhost-user back-end, even though I've been a proponent of it), then
> another way to deal with this issue is to specify that upon receiving
> MAP/UNMAP messages, the vhost-user front-end must update the vhost-user
> memory tables of all other vhost-user devices. That way vhost-user
> devices will be able to access VIRTIO Shared Memory Regions mapped by
> other devices.
>
> Implementing this in QEMU should be much easier than implementing
> READ_/WRITE_MEM support in device back-ends.
>
> This will be slow and scale poorly but performance is only a problem for
> devices that frequently MAP/UNMAP like virtiofs. Will virtio-gpu and
> virtio-media use MAP/UNMAP often at runtime? They might be able to get
> away with this simple solution.
>
> I'd be happy with that. If someone wants to make virtiofs DAX faster,
> they can implement READ/WRITE_MEM or another solution later, but let's
> at least make things correct from the start.
>

I agree. I want it to be correct first. If you agree on splitting the spec
bits from this
patch I'm already happy. I suggested splitting READ_/WRITE_MEM messages
because I thought that it was a virtiofs-specific issue.

The alternative that you proposed is interesting. I'll take it into
account. But I
feel I prefer to go for the better solution, and if I get too entangled,
then switch
to the easier implementation.

I think we could do this in 2 patches:
1. Split the documentation bits for SHMEM_MAP/_UNMAP. The
    implementation for these messages will go into the second patch.
2. The implementation patch: keep going for the time being with
     READ_/WRITE_MEM support. And the documentation for that
    is kept it within this patch. This way if we switch to the frontend
    updating vhost-user memory table, we weren't set in any specific
    solution if patch 1 has been already merged.

BR,
Albert.


>
> Stefan
>
> >
> > WDYT?
> >
> > BR,
> > Albert.
> >
> > On Tue, Jul 16, 2024 at 3:21 AM David Stevens <stevensd@chromium.org>
> wrote:
> >
> > > On Fri, Jul 12, 2024 at 2:47 PM Michael S. Tsirkin <mst@redhat.com>
> wrote:
> > > >
> > > > On Fri, Jul 12, 2024 at 11:06:49AM +0900, David Stevens wrote:
> > > > > On Thu, Jul 11, 2024 at 7:56 PM Alyssa Ross <hi@alyssa.is> wrote:
> > > > > >
> > > > > > Adding David Stevens, who implemented SHMEM_MAP and SHMEM_UNMAP
> in
> > > > > > crosvm a couple of years ago.
> > > > > >
> > > > > > David, I'd be particularly interested for your thoughts on the
> > > MEM_READ
> > > > > > and MEM_WRITE commands, since as far as I know crosvm doesn't
> > > implement
> > > > > > anything like that.  The discussion leading to those being added
> > > starts
> > > > > > here:
> > > > > >
> > > > > >
> > >
> https://lore.kernel.org/qemu-devel/20240604185416.GB90471@fedora.redhat.com/
> > > > > >
> > > > > > It would be great if this could be standardised between QEMU and
> > > crosvm
> > > > > > (and therefore have a clearer path toward being implemented in
> other
> > > VMMs)!
> > > > >
> > > > > Setting aside vhost-user for a moment, the DAX example given by
> Stefan
> > > > > won't work in crosvm today.
> > > > >
> > > > > Is universal access to virtio shared memory regions actually
> mandated
> > > > > by the virtio spec? Copying from virtiofs DAX to virtiofs sharing
> > > > > seems reasonable enough, but what about virtio-pmem to virtio-blk?
> > > > > What about screenshotting a framebuffer in virtio-gpu shared
> memory to
> > > > > virtio-scsi? I guess with some plumbing in the VMM, it's solvable
> in a
> > > > > virtualized environment. But what about when you have real hardware
> > > > > that speaks virtio involved? That's outside my wheelhouse, but it
> > > > > doesn't seem like that would be easy to solve.
> > > >
> > > > Yes, it can work for physical devices if allowed by host
> configuration.
> > > > E.g. VFIO supports that I think. Don't think VDPA does.
> > >
> > > I'm sure it can work, but that sounds more like a SHOULD (MAY?),
> > > rather than a MUST.
> > >
> > > > > For what it's worth, my interpretation of the target scenario:
> > > > >
> > > > > > Other backends don't see these mappings. If the guest submits a
> vring
> > > > > > descriptor referencing a mapping to another backend, then that
> > > backend
> > > > > > won't be able to access this memory
> > > > >
> > > > > is that it's omitting how the implementation is reconciled with
> > > > > section 2.10.1 of v1.3 of the virtio spec, which states that:
> > > > >
> > > > > > References into shared memory regions are represented as offsets
> from
> > > > > > the beginning of the region instead of absolute memory addresses.
> > > Offsets
> > > > > > are used both for references between structures stored within
> shared
> > > > > > memory and for requests placed in virtqueues that refer to shared
> > > memory.
> > > > >
> > > > > My interpretation of that statement is that putting raw guest
> physical
> > > > > addresses corresponding to virtio shared memory regions into a
> vring
> > > > > is a driver spec violation.
> > > > >
> > > > > -David
> > > >
> > > > This really applies within device I think. Should be clarified ...
> > >
> > > You mean that a virtio device can use absolute memory addresses for
> > > other devices' shared memory regions, but it can't use absolute memory
> > > addresses for its own shared memory regions? That's a rather strange
> > > requirement. Or is the statement simply giving an addressing strategy
> > > that device type specifications are free to ignore?
> > >
> > > -David
> > >
> > >
>
Stefan Hajnoczi Sept. 6, 2024, 1 p.m. UTC | #12
On Fri, 6 Sept 2024 at 00:19, David Stevens <stevensd@chromium.org> wrote:
>
> On Fri, Sep 6, 2024 at 12:56 AM Stefan Hajnoczi <stefanha@redhat.com> wrote:
> >
> > On Tue, Jul 16, 2024 at 10:21:35AM +0900, David Stevens wrote:
> > > On Fri, Jul 12, 2024 at 2:47 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > >
> > > > On Fri, Jul 12, 2024 at 11:06:49AM +0900, David Stevens wrote:
> > > > > On Thu, Jul 11, 2024 at 7:56 PM Alyssa Ross <hi@alyssa.is> wrote:
> > > > > >
> > > > > > Adding David Stevens, who implemented SHMEM_MAP and SHMEM_UNMAP in
> > > > > > crosvm a couple of years ago.
> > > > > >
> > > > > > David, I'd be particularly interested for your thoughts on the MEM_READ
> > > > > > and MEM_WRITE commands, since as far as I know crosvm doesn't implement
> > > > > > anything like that.  The discussion leading to those being added starts
> > > > > > here:
> > > > > >
> > > > > > https://lore.kernel.org/qemu-devel/20240604185416.GB90471@fedora.redhat.com/
> > > > > >
> > > > > > It would be great if this could be standardised between QEMU and crosvm
> > > > > > (and therefore have a clearer path toward being implemented in other VMMs)!
> > > > >
> > > > > Setting aside vhost-user for a moment, the DAX example given by Stefan
> > > > > won't work in crosvm today.
> > > > >
> > > > > Is universal access to virtio shared memory regions actually mandated
> > > > > by the virtio spec? Copying from virtiofs DAX to virtiofs sharing
> > > > > seems reasonable enough, but what about virtio-pmem to virtio-blk?
> > > > > What about screenshotting a framebuffer in virtio-gpu shared memory to
> > > > > virtio-scsi? I guess with some plumbing in the VMM, it's solvable in a
> > > > > virtualized environment. But what about when you have real hardware
> > > > > that speaks virtio involved? That's outside my wheelhouse, but it
> > > > > doesn't seem like that would be easy to solve.
> > > >
> > > > Yes, it can work for physical devices if allowed by host configuration.
> > > > E.g. VFIO supports that I think. Don't think VDPA does.
> > >
> > > I'm sure it can work, but that sounds more like a SHOULD (MAY?),
> > > rather than a MUST.
> > >
> > > > > For what it's worth, my interpretation of the target scenario:
> > > > >
> > > > > > Other backends don't see these mappings. If the guest submits a vring
> > > > > > descriptor referencing a mapping to another backend, then that backend
> > > > > > won't be able to access this memory
> > > > >
> > > > > is that it's omitting how the implementation is reconciled with
> > > > > section 2.10.1 of v1.3 of the virtio spec, which states that:
> > > > >
> > > > > > References into shared memory regions are represented as offsets from
> > > > > > the beginning of the region instead of absolute memory addresses. Offsets
> > > > > > are used both for references between structures stored within shared
> > > > > > memory and for requests placed in virtqueues that refer to shared memory.
> > > > >
> > > > > My interpretation of that statement is that putting raw guest physical
> > > > > addresses corresponding to virtio shared memory regions into a vring
> > > > > is a driver spec violation.
> > > > >
> > > > > -David
> > > >
> > > > This really applies within device I think. Should be clarified ...
> > >
> > > You mean that a virtio device can use absolute memory addresses for
> > > other devices' shared memory regions, but it can't use absolute memory
> > > addresses for its own shared memory regions? That's a rather strange
> > > requirement. Or is the statement simply giving an addressing strategy
> > > that device type specifications are free to ignore?
> >
> > My recollection of the intent behind the quoted section is:
> >
> > 1. Structures in shared memory that point to shared memory must used
> >    relative offsets instead of absolute physical addresses.
> > 2. Virtqueue requests that refer to shared memory (e.g. map this page
> >    from virtiofs file to this location in shared memory) must use
> >    relative offsets instead of absolute physical addresses.
> >
> > In other words, shared memory must be relocatable. Don't assume Shared
> > Memory Regions have an absolute guest physical address. This makes
> > device implementations independent of the guest physical memory layout
> > and might also help when Shared Memory Regions are exposed to guest
> > user-space where the guest physical memory layout isn't known.
>
> Doesn't this discussion contradict the necessity of point 1? If I'm
> understanding things correctly, it is valid for virtio device A to
> refer to a structure in virtio device B's shared memory region by
> absolute guest physical address. At least there is nothing in the spec
> about resolving shmid values among different virtio devices, so
> absolute guest physical addresses is the only way this sharing can be
> done. And if it's valid for a pointer to a structure in a shared
> memory region to exist, it's not clear to me why you can't have
> pointers within a shared memory region.

The reason is that VIRTIO has a layered design where the transport and
vring layout deal with bus addresses but device types generally do not
(except for specific exceptions like memory ballooning, etc).

A device's virtqueue requests do not contain addresses (e.g. struct
virtio_net_hdr). The virtqueue interface hides the details of memory
organization and access. In theory a transport could be implemented
over a medium that doesn't even offer shared memory (I think people
have played with remote VIRTIO over TCP) and this is possible because
device types don't know how virtqueue elements are represented.

This same design constraint extends to VIRTIO Shared Memory Regions
because a VIRTIO Shared Memory Region's contents are defined by the
device type, just like virtqueue requests.. I mentioned that it avoids
address translation in device type implementations and also makes it
easy to expose VIRTIO Shared Memory Regions to guest userspace.

(Similarly, putting addresses into the VIRTIO Configuration Space is
also problematic because it exposes details of memory to the device
type. They should be hidden by the VIRTIO transport.)

That explains the intention within the VIRTIO world. The question you
raised was why you're allowed to then pass the address of a VIRTIO
Shared Memory Region to another device instead of passing a <shmid,
offset> pair. The answer is because DMA is beyond the scope of the
VIRTIO spec. If the architecture allows you to expose a buffer that
happens to be located in a VIRTIO Shared Memory Region to another
device, then it's possible to pass that address. The other device may
not even be a VIRTIO device. It just performs a DMA transaction to
read/write that memory. This is happening at another layer and it's a
valid thing to do.

So the answer is that in terms of designing VIRTIO device types,
VIRTIO Shared Memory Region structure layouts or virtqueue request
structs referring to VIRTIO Shared Memory Regions must not use
addresses. But you may be able to pass the address of a VIRTIO Shared
Memory Region to another device for DMA. They don't conflict because
they are at different levels.

> It definitely makes sense that setting up a mapping should be done
> with offsets. But unless a shared memory region can be dynamically
> reallocated at runtime, then it doesn't seem necessary to ban pointers
> within a shared memory region.

On a PCI device the BAR containing the VIRTIO Shared Memory Region can
be remapped at runtime, so the shared memory region can move.

The reason is the same as for why device types don't use address
inside virtqueue request structures: the details of memory are hidden
by the VIRTIO transport and the device type doesn't deal in addresses.

Also, it makes mmapping a VIRTIO Shared Memory Region difficult in
guest userspace because now some kind of address translation mechanism
is necessary (i.e. IOMMU) so that the userspace application (which
doesn't know about physical memory addresses) and the device
implementation can translate memory. Just using offsets avoids this
problem.

Stefan
Stefan Hajnoczi Sept. 6, 2024, 1:15 p.m. UTC | #13
On Fri, 6 Sept 2024 at 03:06, Albert Esteve <aesteve@redhat.com> wrote:
> On Thu, Sep 5, 2024 at 6:39 PM Stefan Hajnoczi <stefanha@redhat.com> wrote:
>>
>> On Tue, Sep 03, 2024 at 10:42:34AM +0200, Albert Esteve wrote:
>> > Hello all,
>> >
>> > Sorry, I have been a bit disconnected from this thread as I was on
>> > vacations and then had to switch tasks for a while.
>> >
>> > I will try to go through all comments and address them for the first
>> > non-RFC drop of this patch series.
>> >
>> > But I was discussing with some colleagues on this. So turns out rust-vmm's
>> > vhost-user-gpu will potentially use
>> > this soon, and a rust-vmm/vhost patch have been already posted:
>> > https://github.com/rust-vmm/vhost/pull/251.
>> > So I think it may make sense to:
>> > 1. Split the vhost-user documentation patch once settled. Since it is taken
>> > as the official spec,
>> >     having it upstreamed independently of the implementation will benefit
>> > other projects to
>> >     work/integrate their own code.
>> > 2. Split READ_/WRITE_MEM messages from SHMEM_MAP/_UNMAP patches.
>> >     If I remember correctly, this addresses a virtio-fs specific issue,
>> > that will not
>> >     impact either virtio-gpu nor virtio-media, or any other.
>>
>> This is an architectural issue that arises from exposing VIRTIO Shared
>> Memory Regions in vhost-user. It was first seen with Linux virtiofs but
>> it could happen with other devices and/or guest operating systems.
>>
>> Any VIRTIO Shared Memory Region that can be mmapped into Linux userspace
>> may trigger this issue. Userspace may write(2) to an O_DIRECT file with
>> the mmap as the source. The vhost-user-blk device will not be able to
>> access the source device's VIRTIO Shared Memory Region and will fail.
>>
>> > So it may make
>> > sense
>> >     to separate them so that one does not stall the other. I will try to
>> > have both
>> >     integrated in the mid term.
>>
>> If READ_/WRITE_MEM is a pain to implement (I think it is in the
>> vhost-user back-end, even though I've been a proponent of it), then
>> another way to deal with this issue is to specify that upon receiving
>> MAP/UNMAP messages, the vhost-user front-end must update the vhost-user
>> memory tables of all other vhost-user devices. That way vhost-user
>> devices will be able to access VIRTIO Shared Memory Regions mapped by
>> other devices.
>>
>> Implementing this in QEMU should be much easier than implementing
>> READ_/WRITE_MEM support in device back-ends.
>>
>> This will be slow and scale poorly but performance is only a problem for
>> devices that frequently MAP/UNMAP like virtiofs. Will virtio-gpu and
>> virtio-media use MAP/UNMAP often at runtime? They might be able to get
>> away with this simple solution.
>>
>> I'd be happy with that. If someone wants to make virtiofs DAX faster,
>> they can implement READ/WRITE_MEM or another solution later, but let's
>> at least make things correct from the start.
>
>
> I agree. I want it to be correct first. If you agree on splitting the spec bits from this
> patch I'm already happy. I suggested splitting READ_/WRITE_MEM messages
> because I thought that it was a virtiofs-specific issue.
>
> The alternative that you proposed is interesting. I'll take it into account. But I
> feel I prefer to go for the better solution, and if I get too entangled, then switch
> to the easier implementation.

Great. The difficult part to implementing READ_/WRITE_MEM messages is
modifying libvhost-user and rust-vmm's vhost crate to send the new
messages when address translation fails. This needs to cover all
memory accesses (including vring struct accesses). That code may be a
few levels down in the call stack and assume it can always load/store
directly from mmapped memory.

>
> I think we could do this in 2 patches:
> 1. Split the documentation bits for SHMEM_MAP/_UNMAP. The
>     implementation for these messages will go into the second patch.
> 2. The implementation patch: keep going for the time being with
>      READ_/WRITE_MEM support. And the documentation for that
>     is kept it within this patch. This way if we switch to the frontend
>     updating vhost-user memory table, we weren't set in any specific
>     solution if patch 1 has been already merged.

I'm happy as long as the vhost-user spec patch that introduces
MAP/UNMAP also covers a solution for the memory access problem (either
READ_/WRITE_MEM or propagating mappings to all vhost-user back-ends).

Stefan

>
> BR,
> Albert.
>
>>
>>
>> Stefan
>>
>> >
>> > WDYT?
>> >
>> > BR,
>> > Albert.
>> >
>> > On Tue, Jul 16, 2024 at 3:21 AM David Stevens <stevensd@chromium.org> wrote:
>> >
>> > > On Fri, Jul 12, 2024 at 2:47 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>> > > >
>> > > > On Fri, Jul 12, 2024 at 11:06:49AM +0900, David Stevens wrote:
>> > > > > On Thu, Jul 11, 2024 at 7:56 PM Alyssa Ross <hi@alyssa.is> wrote:
>> > > > > >
>> > > > > > Adding David Stevens, who implemented SHMEM_MAP and SHMEM_UNMAP in
>> > > > > > crosvm a couple of years ago.
>> > > > > >
>> > > > > > David, I'd be particularly interested for your thoughts on the
>> > > MEM_READ
>> > > > > > and MEM_WRITE commands, since as far as I know crosvm doesn't
>> > > implement
>> > > > > > anything like that.  The discussion leading to those being added
>> > > starts
>> > > > > > here:
>> > > > > >
>> > > > > >
>> > > https://lore.kernel.org/qemu-devel/20240604185416.GB90471@fedora.redhat.com/
>> > > > > >
>> > > > > > It would be great if this could be standardised between QEMU and
>> > > crosvm
>> > > > > > (and therefore have a clearer path toward being implemented in other
>> > > VMMs)!
>> > > > >
>> > > > > Setting aside vhost-user for a moment, the DAX example given by Stefan
>> > > > > won't work in crosvm today.
>> > > > >
>> > > > > Is universal access to virtio shared memory regions actually mandated
>> > > > > by the virtio spec? Copying from virtiofs DAX to virtiofs sharing
>> > > > > seems reasonable enough, but what about virtio-pmem to virtio-blk?
>> > > > > What about screenshotting a framebuffer in virtio-gpu shared memory to
>> > > > > virtio-scsi? I guess with some plumbing in the VMM, it's solvable in a
>> > > > > virtualized environment. But what about when you have real hardware
>> > > > > that speaks virtio involved? That's outside my wheelhouse, but it
>> > > > > doesn't seem like that would be easy to solve.
>> > > >
>> > > > Yes, it can work for physical devices if allowed by host configuration.
>> > > > E.g. VFIO supports that I think. Don't think VDPA does.
>> > >
>> > > I'm sure it can work, but that sounds more like a SHOULD (MAY?),
>> > > rather than a MUST.
>> > >
>> > > > > For what it's worth, my interpretation of the target scenario:
>> > > > >
>> > > > > > Other backends don't see these mappings. If the guest submits a vring
>> > > > > > descriptor referencing a mapping to another backend, then that
>> > > backend
>> > > > > > won't be able to access this memory
>> > > > >
>> > > > > is that it's omitting how the implementation is reconciled with
>> > > > > section 2.10.1 of v1.3 of the virtio spec, which states that:
>> > > > >
>> > > > > > References into shared memory regions are represented as offsets from
>> > > > > > the beginning of the region instead of absolute memory addresses.
>> > > Offsets
>> > > > > > are used both for references between structures stored within shared
>> > > > > > memory and for requests placed in virtqueues that refer to shared
>> > > memory.
>> > > > >
>> > > > > My interpretation of that statement is that putting raw guest physical
>> > > > > addresses corresponding to virtio shared memory regions into a vring
>> > > > > is a driver spec violation.
>> > > > >
>> > > > > -David
>> > > >
>> > > > This really applies within device I think. Should be clarified ...
>> > >
>> > > You mean that a virtio device can use absolute memory addresses for
>> > > other devices' shared memory regions, but it can't use absolute memory
>> > > addresses for its own shared memory regions? That's a rather strange
>> > > requirement. Or is the statement simply giving an addressing strategy
>> > > that device type specifications are free to ignore?
>> > >
>> > > -David
>> > >
>> > >