Message ID | 20240628145710.1516121-1-aesteve@redhat.com (mailing list archive) |
---|---|
Headers | show |
Series | vhost-user: Add SHMEM_MAP/UNMAP requests | expand |
On Fri, Jun 28, 2024 at 04:57:05PM +0200, Albert Esteve wrote: > Hi all, > > v1->v2: > - Corrected typos and clarifications from > first review > - Added SHMEM_CONFIG frontend request to > query VIRTIO shared memory regions from > backends > - vhost-user-device to use SHMEM_CONFIG > to request and initialise regions > - Added MEM_READ/WRITE backend requests > in case address translation fails > accessing VIRTIO Shared Memory Regions > with MMAPs Hi Albert, I will be offline next week. I've posted comments. I think the hard part will be adjusting vhost-user backend code to make use of MEM_READ/MEM_WRITE when address translation fails. Ideally every guest memory access (including vring accesses) should fall back to MEM_READ/MEM_WRITE. A good test for MEM_READ/MEM_WRITE is to completely skip setting the memory table from the frontend and fall back for every guest memory access. If the vhost-user backend still works without a memory table then you know MEM_READ/MEM_WRITE is working too. The vhost-user spec should probably contain a comment explaining that MEM_READ/MEM_WRITE may be necessary when other device backends use SHMEM_MAP due to the incomplete memory table that prevents translating those memory addresses. In other words, if the guest has a device that uses SHMEM_MAP, then all other vhost-user devices should support MEM_READ/MEM_WRITE in order to ensure that DMA works with Shared Memory Regions. Stefan > > This is an update of my attempt to have > backends support dynamic fd mapping into VIRTIO > Shared Memory Regions. After the first review > I have added more commits and new messages > to the vhost-user protocol. > However, I still have some doubts as to > how will this work, specially regarding > the MEM_READ and MEM_WRITE commands. > Thus, I am still looking for feedback, > to ensure that I am going in the right > direction with the implementation. > > The usecase for this patch is, e.g., to support > vhost-user-gpu RESOURCE_BLOB operations, > or DAX Window request for virtio-fs. In > general, any operation where a backend > need to request the frontend to mmap an > fd into a VIRTIO Shared Memory Region, > so that the guest can then access it. > > After receiving the SHMEM_MAP/UNMAP request, > the frontend will perform the mmap with the > instructed parameters (i.e., shmid, shm_offset, > fd_offset, fd, lenght). > > As there are already a couple devices > that could benefit of such a feature, > and more could require it in the future, > the goal is to make the implementation > generic. > > To that end, the VIRTIO Shared Memory > Region list is declared in the `VirtIODevice` > struct. > > This patch also includes: > SHMEM_CONFIG frontend request that is > specifically meant to allow generic > vhost-user-device frontend to be able to > query VIRTIO Shared Memory settings from the > backend (as this device is generic and agnostic > of the actual backend configuration). > > Finally, MEM_READ/WRITE backend requests are > added to deal with a potential issue when having > any backend sharing a descriptor that references > a mapping to another backend. The first > backend will not be able to see these > mappings. So these requests are a fallback > for vhost-user memory translation fails. > > Albert Esteve (5): > vhost-user: Add VIRTIO Shared Memory map request > vhost_user: Add frontend command for shmem config > vhost-user-dev: Add cache BAR > vhost_user: Add MEM_READ/WRITE backend requests > vhost_user: Implement mem_read/mem_write handlers > > docs/interop/vhost-user.rst | 58 ++++++ > hw/virtio/vhost-user-base.c | 39 +++- > hw/virtio/vhost-user-device-pci.c | 37 +++- > hw/virtio/vhost-user.c | 221 ++++++++++++++++++++++ > hw/virtio/virtio.c | 12 ++ > include/hw/virtio/vhost-backend.h | 6 + > include/hw/virtio/vhost-user.h | 1 + > include/hw/virtio/virtio.h | 5 + > subprojects/libvhost-user/libvhost-user.c | 149 +++++++++++++++ > subprojects/libvhost-user/libvhost-user.h | 91 +++++++++ > 10 files changed, 614 insertions(+), 5 deletions(-) > > -- > 2.45.2 >
Adding David Stevens, who implemented SHMEM_MAP and SHMEM_UNMAP in crosvm a couple of years ago. David, I'd be particularly interested for your thoughts on the MEM_READ and MEM_WRITE commands, since as far as I know crosvm doesn't implement anything like that. The discussion leading to those being added starts here: https://lore.kernel.org/qemu-devel/20240604185416.GB90471@fedora.redhat.com/ It would be great if this could be standardised between QEMU and crosvm (and therefore have a clearer path toward being implemented in other VMMs)! Albert Esteve <aesteve@redhat.com> writes: > Hi all, > > v1->v2: > - Corrected typos and clarifications from > first review > - Added SHMEM_CONFIG frontend request to > query VIRTIO shared memory regions from > backends > - vhost-user-device to use SHMEM_CONFIG > to request and initialise regions > - Added MEM_READ/WRITE backend requests > in case address translation fails > accessing VIRTIO Shared Memory Regions > with MMAPs > > This is an update of my attempt to have > backends support dynamic fd mapping into VIRTIO > Shared Memory Regions. After the first review > I have added more commits and new messages > to the vhost-user protocol. > However, I still have some doubts as to > how will this work, specially regarding > the MEM_READ and MEM_WRITE commands. > Thus, I am still looking for feedback, > to ensure that I am going in the right > direction with the implementation. > > The usecase for this patch is, e.g., to support > vhost-user-gpu RESOURCE_BLOB operations, > or DAX Window request for virtio-fs. In > general, any operation where a backend > need to request the frontend to mmap an > fd into a VIRTIO Shared Memory Region, > so that the guest can then access it. > > After receiving the SHMEM_MAP/UNMAP request, > the frontend will perform the mmap with the > instructed parameters (i.e., shmid, shm_offset, > fd_offset, fd, lenght). > > As there are already a couple devices > that could benefit of such a feature, > and more could require it in the future, > the goal is to make the implementation > generic. > > To that end, the VIRTIO Shared Memory > Region list is declared in the `VirtIODevice` > struct. > > This patch also includes: > SHMEM_CONFIG frontend request that is > specifically meant to allow generic > vhost-user-device frontend to be able to > query VIRTIO Shared Memory settings from the > backend (as this device is generic and agnostic > of the actual backend configuration). > > Finally, MEM_READ/WRITE backend requests are > added to deal with a potential issue when having > any backend sharing a descriptor that references > a mapping to another backend. The first > backend will not be able to see these > mappings. So these requests are a fallback > for vhost-user memory translation fails. > > Albert Esteve (5): > vhost-user: Add VIRTIO Shared Memory map request > vhost_user: Add frontend command for shmem config > vhost-user-dev: Add cache BAR > vhost_user: Add MEM_READ/WRITE backend requests > vhost_user: Implement mem_read/mem_write handlers > > docs/interop/vhost-user.rst | 58 ++++++ > hw/virtio/vhost-user-base.c | 39 +++- > hw/virtio/vhost-user-device-pci.c | 37 +++- > hw/virtio/vhost-user.c | 221 ++++++++++++++++++++++ > hw/virtio/virtio.c | 12 ++ > include/hw/virtio/vhost-backend.h | 6 + > include/hw/virtio/vhost-user.h | 1 + > include/hw/virtio/virtio.h | 5 + > subprojects/libvhost-user/libvhost-user.c | 149 +++++++++++++++ > subprojects/libvhost-user/libvhost-user.h | 91 +++++++++ > 10 files changed, 614 insertions(+), 5 deletions(-) > > -- > 2.45.2
On Thu, Jul 11, 2024 at 7:56 PM Alyssa Ross <hi@alyssa.is> wrote: > > Adding David Stevens, who implemented SHMEM_MAP and SHMEM_UNMAP in > crosvm a couple of years ago. > > David, I'd be particularly interested for your thoughts on the MEM_READ > and MEM_WRITE commands, since as far as I know crosvm doesn't implement > anything like that. The discussion leading to those being added starts > here: > > https://lore.kernel.org/qemu-devel/20240604185416.GB90471@fedora.redhat.com/ > > It would be great if this could be standardised between QEMU and crosvm > (and therefore have a clearer path toward being implemented in other VMMs)! Setting aside vhost-user for a moment, the DAX example given by Stefan won't work in crosvm today. Is universal access to virtio shared memory regions actually mandated by the virtio spec? Copying from virtiofs DAX to virtiofs sharing seems reasonable enough, but what about virtio-pmem to virtio-blk? What about screenshotting a framebuffer in virtio-gpu shared memory to virtio-scsi? I guess with some plumbing in the VMM, it's solvable in a virtualized environment. But what about when you have real hardware that speaks virtio involved? That's outside my wheelhouse, but it doesn't seem like that would be easy to solve. For what it's worth, my interpretation of the target scenario: > Other backends don't see these mappings. If the guest submits a vring > descriptor referencing a mapping to another backend, then that backend > won't be able to access this memory is that it's omitting how the implementation is reconciled with section 2.10.1 of v1.3 of the virtio spec, which states that: > References into shared memory regions are represented as offsets from > the beginning of the region instead of absolute memory addresses. Offsets > are used both for references between structures stored within shared > memory and for requests placed in virtqueues that refer to shared memory. My interpretation of that statement is that putting raw guest physical addresses corresponding to virtio shared memory regions into a vring is a driver spec violation. -David
On Fri, Jul 12, 2024 at 11:06:49AM +0900, David Stevens wrote: > On Thu, Jul 11, 2024 at 7:56 PM Alyssa Ross <hi@alyssa.is> wrote: > > > > Adding David Stevens, who implemented SHMEM_MAP and SHMEM_UNMAP in > > crosvm a couple of years ago. > > > > David, I'd be particularly interested for your thoughts on the MEM_READ > > and MEM_WRITE commands, since as far as I know crosvm doesn't implement > > anything like that. The discussion leading to those being added starts > > here: > > > > https://lore.kernel.org/qemu-devel/20240604185416.GB90471@fedora.redhat.com/ > > > > It would be great if this could be standardised between QEMU and crosvm > > (and therefore have a clearer path toward being implemented in other VMMs)! > > Setting aside vhost-user for a moment, the DAX example given by Stefan > won't work in crosvm today. > > Is universal access to virtio shared memory regions actually mandated > by the virtio spec? Copying from virtiofs DAX to virtiofs sharing > seems reasonable enough, but what about virtio-pmem to virtio-blk? > What about screenshotting a framebuffer in virtio-gpu shared memory to > virtio-scsi? I guess with some plumbing in the VMM, it's solvable in a > virtualized environment. But what about when you have real hardware > that speaks virtio involved? That's outside my wheelhouse, but it > doesn't seem like that would be easy to solve. Yes, it can work for physical devices if allowed by host configuration. E.g. VFIO supports that I think. Don't think VDPA does. > For what it's worth, my interpretation of the target scenario: > > > Other backends don't see these mappings. If the guest submits a vring > > descriptor referencing a mapping to another backend, then that backend > > won't be able to access this memory > > is that it's omitting how the implementation is reconciled with > section 2.10.1 of v1.3 of the virtio spec, which states that: > > > References into shared memory regions are represented as offsets from > > the beginning of the region instead of absolute memory addresses. Offsets > > are used both for references between structures stored within shared > > memory and for requests placed in virtqueues that refer to shared memory. > > My interpretation of that statement is that putting raw guest physical > addresses corresponding to virtio shared memory regions into a vring > is a driver spec violation. > > -David This really applies within device I think. Should be clarified ...
On Fri, Jul 12, 2024 at 1:48 PM Michael S. Tsirkin <mst@redhat.com> wrote: > > On Fri, Jul 12, 2024 at 11:06:49AM +0900, David Stevens wrote: > > On Thu, Jul 11, 2024 at 7:56 PM Alyssa Ross <hi@alyssa.is> wrote: > > > > > > Adding David Stevens, who implemented SHMEM_MAP and SHMEM_UNMAP in > > > crosvm a couple of years ago. > > > > > > David, I'd be particularly interested for your thoughts on the MEM_READ > > > and MEM_WRITE commands, since as far as I know crosvm doesn't implement > > > anything like that. The discussion leading to those being added starts > > > here: > > > > > > https://lore.kernel.org/qemu-devel/20240604185416.GB90471@fedora.redhat.com/ > > > > > > It would be great if this could be standardised between QEMU and crosvm > > > (and therefore have a clearer path toward being implemented in other VMMs)! > > > > Setting aside vhost-user for a moment, the DAX example given by Stefan > > won't work in crosvm today. > > > > Is universal access to virtio shared memory regions actually mandated > > by the virtio spec? Copying from virtiofs DAX to virtiofs sharing > > seems reasonable enough, but what about virtio-pmem to virtio-blk? > > What about screenshotting a framebuffer in virtio-gpu shared memory to > > virtio-scsi? I guess with some plumbing in the VMM, it's solvable in a > > virtualized environment. But what about when you have real hardware > > that speaks virtio involved? That's outside my wheelhouse, but it > > doesn't seem like that would be easy to solve. > > Yes, it can work for physical devices if allowed by host configuration. > E.g. VFIO supports that I think. Don't think VDPA does. > I guess you meant iommufd support here? Thanks
On Fri, Jul 12, 2024 at 2:47 PM Michael S. Tsirkin <mst@redhat.com> wrote: > > On Fri, Jul 12, 2024 at 11:06:49AM +0900, David Stevens wrote: > > On Thu, Jul 11, 2024 at 7:56 PM Alyssa Ross <hi@alyssa.is> wrote: > > > > > > Adding David Stevens, who implemented SHMEM_MAP and SHMEM_UNMAP in > > > crosvm a couple of years ago. > > > > > > David, I'd be particularly interested for your thoughts on the MEM_READ > > > and MEM_WRITE commands, since as far as I know crosvm doesn't implement > > > anything like that. The discussion leading to those being added starts > > > here: > > > > > > https://lore.kernel.org/qemu-devel/20240604185416.GB90471@fedora.redhat.com/ > > > > > > It would be great if this could be standardised between QEMU and crosvm > > > (and therefore have a clearer path toward being implemented in other VMMs)! > > > > Setting aside vhost-user for a moment, the DAX example given by Stefan > > won't work in crosvm today. > > > > Is universal access to virtio shared memory regions actually mandated > > by the virtio spec? Copying from virtiofs DAX to virtiofs sharing > > seems reasonable enough, but what about virtio-pmem to virtio-blk? > > What about screenshotting a framebuffer in virtio-gpu shared memory to > > virtio-scsi? I guess with some plumbing in the VMM, it's solvable in a > > virtualized environment. But what about when you have real hardware > > that speaks virtio involved? That's outside my wheelhouse, but it > > doesn't seem like that would be easy to solve. > > Yes, it can work for physical devices if allowed by host configuration. > E.g. VFIO supports that I think. Don't think VDPA does. I'm sure it can work, but that sounds more like a SHOULD (MAY?), rather than a MUST. > > For what it's worth, my interpretation of the target scenario: > > > > > Other backends don't see these mappings. If the guest submits a vring > > > descriptor referencing a mapping to another backend, then that backend > > > won't be able to access this memory > > > > is that it's omitting how the implementation is reconciled with > > section 2.10.1 of v1.3 of the virtio spec, which states that: > > > > > References into shared memory regions are represented as offsets from > > > the beginning of the region instead of absolute memory addresses. Offsets > > > are used both for references between structures stored within shared > > > memory and for requests placed in virtqueues that refer to shared memory. > > > > My interpretation of that statement is that putting raw guest physical > > addresses corresponding to virtio shared memory regions into a vring > > is a driver spec violation. > > > > -David > > This really applies within device I think. Should be clarified ... You mean that a virtio device can use absolute memory addresses for other devices' shared memory regions, but it can't use absolute memory addresses for its own shared memory regions? That's a rather strange requirement. Or is the statement simply giving an addressing strategy that device type specifications are free to ignore? -David
Hello all, Sorry, I have been a bit disconnected from this thread as I was on vacations and then had to switch tasks for a while. I will try to go through all comments and address them for the first non-RFC drop of this patch series. But I was discussing with some colleagues on this. So turns out rust-vmm's vhost-user-gpu will potentially use this soon, and a rust-vmm/vhost patch have been already posted: https://github.com/rust-vmm/vhost/pull/251. So I think it may make sense to: 1. Split the vhost-user documentation patch once settled. Since it is taken as the official spec, having it upstreamed independently of the implementation will benefit other projects to work/integrate their own code. 2. Split READ_/WRITE_MEM messages from SHMEM_MAP/_UNMAP patches. If I remember correctly, this addresses a virtio-fs specific issue, that will not impact either virtio-gpu nor virtio-media, or any other. So it may make sense to separate them so that one does not stall the other. I will try to have both integrated in the mid term. WDYT? BR, Albert. On Tue, Jul 16, 2024 at 3:21 AM David Stevens <stevensd@chromium.org> wrote: > On Fri, Jul 12, 2024 at 2:47 PM Michael S. Tsirkin <mst@redhat.com> wrote: > > > > On Fri, Jul 12, 2024 at 11:06:49AM +0900, David Stevens wrote: > > > On Thu, Jul 11, 2024 at 7:56 PM Alyssa Ross <hi@alyssa.is> wrote: > > > > > > > > Adding David Stevens, who implemented SHMEM_MAP and SHMEM_UNMAP in > > > > crosvm a couple of years ago. > > > > > > > > David, I'd be particularly interested for your thoughts on the > MEM_READ > > > > and MEM_WRITE commands, since as far as I know crosvm doesn't > implement > > > > anything like that. The discussion leading to those being added > starts > > > > here: > > > > > > > > > https://lore.kernel.org/qemu-devel/20240604185416.GB90471@fedora.redhat.com/ > > > > > > > > It would be great if this could be standardised between QEMU and > crosvm > > > > (and therefore have a clearer path toward being implemented in other > VMMs)! > > > > > > Setting aside vhost-user for a moment, the DAX example given by Stefan > > > won't work in crosvm today. > > > > > > Is universal access to virtio shared memory regions actually mandated > > > by the virtio spec? Copying from virtiofs DAX to virtiofs sharing > > > seems reasonable enough, but what about virtio-pmem to virtio-blk? > > > What about screenshotting a framebuffer in virtio-gpu shared memory to > > > virtio-scsi? I guess with some plumbing in the VMM, it's solvable in a > > > virtualized environment. But what about when you have real hardware > > > that speaks virtio involved? That's outside my wheelhouse, but it > > > doesn't seem like that would be easy to solve. > > > > Yes, it can work for physical devices if allowed by host configuration. > > E.g. VFIO supports that I think. Don't think VDPA does. > > I'm sure it can work, but that sounds more like a SHOULD (MAY?), > rather than a MUST. > > > > For what it's worth, my interpretation of the target scenario: > > > > > > > Other backends don't see these mappings. If the guest submits a vring > > > > descriptor referencing a mapping to another backend, then that > backend > > > > won't be able to access this memory > > > > > > is that it's omitting how the implementation is reconciled with > > > section 2.10.1 of v1.3 of the virtio spec, which states that: > > > > > > > References into shared memory regions are represented as offsets from > > > > the beginning of the region instead of absolute memory addresses. > Offsets > > > > are used both for references between structures stored within shared > > > > memory and for requests placed in virtqueues that refer to shared > memory. > > > > > > My interpretation of that statement is that putting raw guest physical > > > addresses corresponding to virtio shared memory regions into a vring > > > is a driver spec violation. > > > > > > -David > > > > This really applies within device I think. Should be clarified ... > > You mean that a virtio device can use absolute memory addresses for > other devices' shared memory regions, but it can't use absolute memory > addresses for its own shared memory regions? That's a rather strange > requirement. Or is the statement simply giving an addressing strategy > that device type specifications are free to ignore? > > -David > >
On Tue, Jul 16, 2024 at 10:21:35AM +0900, David Stevens wrote: > On Fri, Jul 12, 2024 at 2:47 PM Michael S. Tsirkin <mst@redhat.com> wrote: > > > > On Fri, Jul 12, 2024 at 11:06:49AM +0900, David Stevens wrote: > > > On Thu, Jul 11, 2024 at 7:56 PM Alyssa Ross <hi@alyssa.is> wrote: > > > > > > > > Adding David Stevens, who implemented SHMEM_MAP and SHMEM_UNMAP in > > > > crosvm a couple of years ago. > > > > > > > > David, I'd be particularly interested for your thoughts on the MEM_READ > > > > and MEM_WRITE commands, since as far as I know crosvm doesn't implement > > > > anything like that. The discussion leading to those being added starts > > > > here: > > > > > > > > https://lore.kernel.org/qemu-devel/20240604185416.GB90471@fedora.redhat.com/ > > > > > > > > It would be great if this could be standardised between QEMU and crosvm > > > > (and therefore have a clearer path toward being implemented in other VMMs)! > > > > > > Setting aside vhost-user for a moment, the DAX example given by Stefan > > > won't work in crosvm today. > > > > > > Is universal access to virtio shared memory regions actually mandated > > > by the virtio spec? Copying from virtiofs DAX to virtiofs sharing > > > seems reasonable enough, but what about virtio-pmem to virtio-blk? > > > What about screenshotting a framebuffer in virtio-gpu shared memory to > > > virtio-scsi? I guess with some plumbing in the VMM, it's solvable in a > > > virtualized environment. But what about when you have real hardware > > > that speaks virtio involved? That's outside my wheelhouse, but it > > > doesn't seem like that would be easy to solve. > > > > Yes, it can work for physical devices if allowed by host configuration. > > E.g. VFIO supports that I think. Don't think VDPA does. > > I'm sure it can work, but that sounds more like a SHOULD (MAY?), > rather than a MUST. > > > > For what it's worth, my interpretation of the target scenario: > > > > > > > Other backends don't see these mappings. If the guest submits a vring > > > > descriptor referencing a mapping to another backend, then that backend > > > > won't be able to access this memory > > > > > > is that it's omitting how the implementation is reconciled with > > > section 2.10.1 of v1.3 of the virtio spec, which states that: > > > > > > > References into shared memory regions are represented as offsets from > > > > the beginning of the region instead of absolute memory addresses. Offsets > > > > are used both for references between structures stored within shared > > > > memory and for requests placed in virtqueues that refer to shared memory. > > > > > > My interpretation of that statement is that putting raw guest physical > > > addresses corresponding to virtio shared memory regions into a vring > > > is a driver spec violation. > > > > > > -David > > > > This really applies within device I think. Should be clarified ... > > You mean that a virtio device can use absolute memory addresses for > other devices' shared memory regions, but it can't use absolute memory > addresses for its own shared memory regions? That's a rather strange > requirement. Or is the statement simply giving an addressing strategy > that device type specifications are free to ignore? My recollection of the intent behind the quoted section is: 1. Structures in shared memory that point to shared memory must used relative offsets instead of absolute physical addresses. 2. Virtqueue requests that refer to shared memory (e.g. map this page from virtiofs file to this location in shared memory) must use relative offsets instead of absolute physical addresses. In other words, shared memory must be relocatable. Don't assume Shared Memory Regions have an absolute guest physical address. This makes device implementations independent of the guest physical memory layout and might also help when Shared Memory Regions are exposed to guest user-space where the guest physical memory layout isn't known. Stefan
On Tue, Sep 03, 2024 at 10:42:34AM +0200, Albert Esteve wrote: > Hello all, > > Sorry, I have been a bit disconnected from this thread as I was on > vacations and then had to switch tasks for a while. > > I will try to go through all comments and address them for the first > non-RFC drop of this patch series. > > But I was discussing with some colleagues on this. So turns out rust-vmm's > vhost-user-gpu will potentially use > this soon, and a rust-vmm/vhost patch have been already posted: > https://github.com/rust-vmm/vhost/pull/251. > So I think it may make sense to: > 1. Split the vhost-user documentation patch once settled. Since it is taken > as the official spec, > having it upstreamed independently of the implementation will benefit > other projects to > work/integrate their own code. > 2. Split READ_/WRITE_MEM messages from SHMEM_MAP/_UNMAP patches. > If I remember correctly, this addresses a virtio-fs specific issue, > that will not > impact either virtio-gpu nor virtio-media, or any other. This is an architectural issue that arises from exposing VIRTIO Shared Memory Regions in vhost-user. It was first seen with Linux virtiofs but it could happen with other devices and/or guest operating systems. Any VIRTIO Shared Memory Region that can be mmapped into Linux userspace may trigger this issue. Userspace may write(2) to an O_DIRECT file with the mmap as the source. The vhost-user-blk device will not be able to access the source device's VIRTIO Shared Memory Region and will fail. > So it may make > sense > to separate them so that one does not stall the other. I will try to > have both > integrated in the mid term. If READ_/WRITE_MEM is a pain to implement (I think it is in the vhost-user back-end, even though I've been a proponent of it), then another way to deal with this issue is to specify that upon receiving MAP/UNMAP messages, the vhost-user front-end must update the vhost-user memory tables of all other vhost-user devices. That way vhost-user devices will be able to access VIRTIO Shared Memory Regions mapped by other devices. Implementing this in QEMU should be much easier than implementing READ_/WRITE_MEM support in device back-ends. This will be slow and scale poorly but performance is only a problem for devices that frequently MAP/UNMAP like virtiofs. Will virtio-gpu and virtio-media use MAP/UNMAP often at runtime? They might be able to get away with this simple solution. I'd be happy with that. If someone wants to make virtiofs DAX faster, they can implement READ/WRITE_MEM or another solution later, but let's at least make things correct from the start. Stefan > > WDYT? > > BR, > Albert. > > On Tue, Jul 16, 2024 at 3:21 AM David Stevens <stevensd@chromium.org> wrote: > > > On Fri, Jul 12, 2024 at 2:47 PM Michael S. Tsirkin <mst@redhat.com> wrote: > > > > > > On Fri, Jul 12, 2024 at 11:06:49AM +0900, David Stevens wrote: > > > > On Thu, Jul 11, 2024 at 7:56 PM Alyssa Ross <hi@alyssa.is> wrote: > > > > > > > > > > Adding David Stevens, who implemented SHMEM_MAP and SHMEM_UNMAP in > > > > > crosvm a couple of years ago. > > > > > > > > > > David, I'd be particularly interested for your thoughts on the > > MEM_READ > > > > > and MEM_WRITE commands, since as far as I know crosvm doesn't > > implement > > > > > anything like that. The discussion leading to those being added > > starts > > > > > here: > > > > > > > > > > > > https://lore.kernel.org/qemu-devel/20240604185416.GB90471@fedora.redhat.com/ > > > > > > > > > > It would be great if this could be standardised between QEMU and > > crosvm > > > > > (and therefore have a clearer path toward being implemented in other > > VMMs)! > > > > > > > > Setting aside vhost-user for a moment, the DAX example given by Stefan > > > > won't work in crosvm today. > > > > > > > > Is universal access to virtio shared memory regions actually mandated > > > > by the virtio spec? Copying from virtiofs DAX to virtiofs sharing > > > > seems reasonable enough, but what about virtio-pmem to virtio-blk? > > > > What about screenshotting a framebuffer in virtio-gpu shared memory to > > > > virtio-scsi? I guess with some plumbing in the VMM, it's solvable in a > > > > virtualized environment. But what about when you have real hardware > > > > that speaks virtio involved? That's outside my wheelhouse, but it > > > > doesn't seem like that would be easy to solve. > > > > > > Yes, it can work for physical devices if allowed by host configuration. > > > E.g. VFIO supports that I think. Don't think VDPA does. > > > > I'm sure it can work, but that sounds more like a SHOULD (MAY?), > > rather than a MUST. > > > > > > For what it's worth, my interpretation of the target scenario: > > > > > > > > > Other backends don't see these mappings. If the guest submits a vring > > > > > descriptor referencing a mapping to another backend, then that > > backend > > > > > won't be able to access this memory > > > > > > > > is that it's omitting how the implementation is reconciled with > > > > section 2.10.1 of v1.3 of the virtio spec, which states that: > > > > > > > > > References into shared memory regions are represented as offsets from > > > > > the beginning of the region instead of absolute memory addresses. > > Offsets > > > > > are used both for references between structures stored within shared > > > > > memory and for requests placed in virtqueues that refer to shared > > memory. > > > > > > > > My interpretation of that statement is that putting raw guest physical > > > > addresses corresponding to virtio shared memory regions into a vring > > > > is a driver spec violation. > > > > > > > > -David > > > > > > This really applies within device I think. Should be clarified ... > > > > You mean that a virtio device can use absolute memory addresses for > > other devices' shared memory regions, but it can't use absolute memory > > addresses for its own shared memory regions? That's a rather strange > > requirement. Or is the statement simply giving an addressing strategy > > that device type specifications are free to ignore? > > > > -David > > > >
On Fri, Sep 6, 2024 at 12:56 AM Stefan Hajnoczi <stefanha@redhat.com> wrote: > > On Tue, Jul 16, 2024 at 10:21:35AM +0900, David Stevens wrote: > > On Fri, Jul 12, 2024 at 2:47 PM Michael S. Tsirkin <mst@redhat.com> wrote: > > > > > > On Fri, Jul 12, 2024 at 11:06:49AM +0900, David Stevens wrote: > > > > On Thu, Jul 11, 2024 at 7:56 PM Alyssa Ross <hi@alyssa.is> wrote: > > > > > > > > > > Adding David Stevens, who implemented SHMEM_MAP and SHMEM_UNMAP in > > > > > crosvm a couple of years ago. > > > > > > > > > > David, I'd be particularly interested for your thoughts on the MEM_READ > > > > > and MEM_WRITE commands, since as far as I know crosvm doesn't implement > > > > > anything like that. The discussion leading to those being added starts > > > > > here: > > > > > > > > > > https://lore.kernel.org/qemu-devel/20240604185416.GB90471@fedora.redhat.com/ > > > > > > > > > > It would be great if this could be standardised between QEMU and crosvm > > > > > (and therefore have a clearer path toward being implemented in other VMMs)! > > > > > > > > Setting aside vhost-user for a moment, the DAX example given by Stefan > > > > won't work in crosvm today. > > > > > > > > Is universal access to virtio shared memory regions actually mandated > > > > by the virtio spec? Copying from virtiofs DAX to virtiofs sharing > > > > seems reasonable enough, but what about virtio-pmem to virtio-blk? > > > > What about screenshotting a framebuffer in virtio-gpu shared memory to > > > > virtio-scsi? I guess with some plumbing in the VMM, it's solvable in a > > > > virtualized environment. But what about when you have real hardware > > > > that speaks virtio involved? That's outside my wheelhouse, but it > > > > doesn't seem like that would be easy to solve. > > > > > > Yes, it can work for physical devices if allowed by host configuration. > > > E.g. VFIO supports that I think. Don't think VDPA does. > > > > I'm sure it can work, but that sounds more like a SHOULD (MAY?), > > rather than a MUST. > > > > > > For what it's worth, my interpretation of the target scenario: > > > > > > > > > Other backends don't see these mappings. If the guest submits a vring > > > > > descriptor referencing a mapping to another backend, then that backend > > > > > won't be able to access this memory > > > > > > > > is that it's omitting how the implementation is reconciled with > > > > section 2.10.1 of v1.3 of the virtio spec, which states that: > > > > > > > > > References into shared memory regions are represented as offsets from > > > > > the beginning of the region instead of absolute memory addresses. Offsets > > > > > are used both for references between structures stored within shared > > > > > memory and for requests placed in virtqueues that refer to shared memory. > > > > > > > > My interpretation of that statement is that putting raw guest physical > > > > addresses corresponding to virtio shared memory regions into a vring > > > > is a driver spec violation. > > > > > > > > -David > > > > > > This really applies within device I think. Should be clarified ... > > > > You mean that a virtio device can use absolute memory addresses for > > other devices' shared memory regions, but it can't use absolute memory > > addresses for its own shared memory regions? That's a rather strange > > requirement. Or is the statement simply giving an addressing strategy > > that device type specifications are free to ignore? > > My recollection of the intent behind the quoted section is: > > 1. Structures in shared memory that point to shared memory must used > relative offsets instead of absolute physical addresses. > 2. Virtqueue requests that refer to shared memory (e.g. map this page > from virtiofs file to this location in shared memory) must use > relative offsets instead of absolute physical addresses. > > In other words, shared memory must be relocatable. Don't assume Shared > Memory Regions have an absolute guest physical address. This makes > device implementations independent of the guest physical memory layout > and might also help when Shared Memory Regions are exposed to guest > user-space where the guest physical memory layout isn't known. Doesn't this discussion contradict the necessity of point 1? If I'm understanding things correctly, it is valid for virtio device A to refer to a structure in virtio device B's shared memory region by absolute guest physical address. At least there is nothing in the spec about resolving shmid values among different virtio devices, so absolute guest physical addresses is the only way this sharing can be done. And if it's valid for a pointer to a structure in a shared memory region to exist, it's not clear to me why you can't have pointers within a shared memory region. It definitely makes sense that setting up a mapping should be done with offsets. But unless a shared memory region can be dynamically reallocated at runtime, then it doesn't seem necessary to ban pointers within a shared memory region. -David
On Thu, Sep 5, 2024 at 6:39 PM Stefan Hajnoczi <stefanha@redhat.com> wrote: > On Tue, Sep 03, 2024 at 10:42:34AM +0200, Albert Esteve wrote: > > Hello all, > > > > Sorry, I have been a bit disconnected from this thread as I was on > > vacations and then had to switch tasks for a while. > > > > I will try to go through all comments and address them for the first > > non-RFC drop of this patch series. > > > > But I was discussing with some colleagues on this. So turns out > rust-vmm's > > vhost-user-gpu will potentially use > > this soon, and a rust-vmm/vhost patch have been already posted: > > https://github.com/rust-vmm/vhost/pull/251. > > So I think it may make sense to: > > 1. Split the vhost-user documentation patch once settled. Since it is > taken > > as the official spec, > > having it upstreamed independently of the implementation will benefit > > other projects to > > work/integrate their own code. > > 2. Split READ_/WRITE_MEM messages from SHMEM_MAP/_UNMAP patches. > > If I remember correctly, this addresses a virtio-fs specific issue, > > that will not > > impact either virtio-gpu nor virtio-media, or any other. > > This is an architectural issue that arises from exposing VIRTIO Shared > Memory Regions in vhost-user. It was first seen with Linux virtiofs but > it could happen with other devices and/or guest operating systems. > > Any VIRTIO Shared Memory Region that can be mmapped into Linux userspace > may trigger this issue. Userspace may write(2) to an O_DIRECT file with > the mmap as the source. The vhost-user-blk device will not be able to > access the source device's VIRTIO Shared Memory Region and will fail. > > > So it may make > > sense > > to separate them so that one does not stall the other. I will try to > > have both > > integrated in the mid term. > > If READ_/WRITE_MEM is a pain to implement (I think it is in the > vhost-user back-end, even though I've been a proponent of it), then > another way to deal with this issue is to specify that upon receiving > MAP/UNMAP messages, the vhost-user front-end must update the vhost-user > memory tables of all other vhost-user devices. That way vhost-user > devices will be able to access VIRTIO Shared Memory Regions mapped by > other devices. > > Implementing this in QEMU should be much easier than implementing > READ_/WRITE_MEM support in device back-ends. > > This will be slow and scale poorly but performance is only a problem for > devices that frequently MAP/UNMAP like virtiofs. Will virtio-gpu and > virtio-media use MAP/UNMAP often at runtime? They might be able to get > away with this simple solution. > > I'd be happy with that. If someone wants to make virtiofs DAX faster, > they can implement READ/WRITE_MEM or another solution later, but let's > at least make things correct from the start. > I agree. I want it to be correct first. If you agree on splitting the spec bits from this patch I'm already happy. I suggested splitting READ_/WRITE_MEM messages because I thought that it was a virtiofs-specific issue. The alternative that you proposed is interesting. I'll take it into account. But I feel I prefer to go for the better solution, and if I get too entangled, then switch to the easier implementation. I think we could do this in 2 patches: 1. Split the documentation bits for SHMEM_MAP/_UNMAP. The implementation for these messages will go into the second patch. 2. The implementation patch: keep going for the time being with READ_/WRITE_MEM support. And the documentation for that is kept it within this patch. This way if we switch to the frontend updating vhost-user memory table, we weren't set in any specific solution if patch 1 has been already merged. BR, Albert. > > Stefan > > > > > WDYT? > > > > BR, > > Albert. > > > > On Tue, Jul 16, 2024 at 3:21 AM David Stevens <stevensd@chromium.org> > wrote: > > > > > On Fri, Jul 12, 2024 at 2:47 PM Michael S. Tsirkin <mst@redhat.com> > wrote: > > > > > > > > On Fri, Jul 12, 2024 at 11:06:49AM +0900, David Stevens wrote: > > > > > On Thu, Jul 11, 2024 at 7:56 PM Alyssa Ross <hi@alyssa.is> wrote: > > > > > > > > > > > > Adding David Stevens, who implemented SHMEM_MAP and SHMEM_UNMAP > in > > > > > > crosvm a couple of years ago. > > > > > > > > > > > > David, I'd be particularly interested for your thoughts on the > > > MEM_READ > > > > > > and MEM_WRITE commands, since as far as I know crosvm doesn't > > > implement > > > > > > anything like that. The discussion leading to those being added > > > starts > > > > > > here: > > > > > > > > > > > > > > > > https://lore.kernel.org/qemu-devel/20240604185416.GB90471@fedora.redhat.com/ > > > > > > > > > > > > It would be great if this could be standardised between QEMU and > > > crosvm > > > > > > (and therefore have a clearer path toward being implemented in > other > > > VMMs)! > > > > > > > > > > Setting aside vhost-user for a moment, the DAX example given by > Stefan > > > > > won't work in crosvm today. > > > > > > > > > > Is universal access to virtio shared memory regions actually > mandated > > > > > by the virtio spec? Copying from virtiofs DAX to virtiofs sharing > > > > > seems reasonable enough, but what about virtio-pmem to virtio-blk? > > > > > What about screenshotting a framebuffer in virtio-gpu shared > memory to > > > > > virtio-scsi? I guess with some plumbing in the VMM, it's solvable > in a > > > > > virtualized environment. But what about when you have real hardware > > > > > that speaks virtio involved? That's outside my wheelhouse, but it > > > > > doesn't seem like that would be easy to solve. > > > > > > > > Yes, it can work for physical devices if allowed by host > configuration. > > > > E.g. VFIO supports that I think. Don't think VDPA does. > > > > > > I'm sure it can work, but that sounds more like a SHOULD (MAY?), > > > rather than a MUST. > > > > > > > > For what it's worth, my interpretation of the target scenario: > > > > > > > > > > > Other backends don't see these mappings. If the guest submits a > vring > > > > > > descriptor referencing a mapping to another backend, then that > > > backend > > > > > > won't be able to access this memory > > > > > > > > > > is that it's omitting how the implementation is reconciled with > > > > > section 2.10.1 of v1.3 of the virtio spec, which states that: > > > > > > > > > > > References into shared memory regions are represented as offsets > from > > > > > > the beginning of the region instead of absolute memory addresses. > > > Offsets > > > > > > are used both for references between structures stored within > shared > > > > > > memory and for requests placed in virtqueues that refer to shared > > > memory. > > > > > > > > > > My interpretation of that statement is that putting raw guest > physical > > > > > addresses corresponding to virtio shared memory regions into a > vring > > > > > is a driver spec violation. > > > > > > > > > > -David > > > > > > > > This really applies within device I think. Should be clarified ... > > > > > > You mean that a virtio device can use absolute memory addresses for > > > other devices' shared memory regions, but it can't use absolute memory > > > addresses for its own shared memory regions? That's a rather strange > > > requirement. Or is the statement simply giving an addressing strategy > > > that device type specifications are free to ignore? > > > > > > -David > > > > > > >
On Fri, 6 Sept 2024 at 00:19, David Stevens <stevensd@chromium.org> wrote: > > On Fri, Sep 6, 2024 at 12:56 AM Stefan Hajnoczi <stefanha@redhat.com> wrote: > > > > On Tue, Jul 16, 2024 at 10:21:35AM +0900, David Stevens wrote: > > > On Fri, Jul 12, 2024 at 2:47 PM Michael S. Tsirkin <mst@redhat.com> wrote: > > > > > > > > On Fri, Jul 12, 2024 at 11:06:49AM +0900, David Stevens wrote: > > > > > On Thu, Jul 11, 2024 at 7:56 PM Alyssa Ross <hi@alyssa.is> wrote: > > > > > > > > > > > > Adding David Stevens, who implemented SHMEM_MAP and SHMEM_UNMAP in > > > > > > crosvm a couple of years ago. > > > > > > > > > > > > David, I'd be particularly interested for your thoughts on the MEM_READ > > > > > > and MEM_WRITE commands, since as far as I know crosvm doesn't implement > > > > > > anything like that. The discussion leading to those being added starts > > > > > > here: > > > > > > > > > > > > https://lore.kernel.org/qemu-devel/20240604185416.GB90471@fedora.redhat.com/ > > > > > > > > > > > > It would be great if this could be standardised between QEMU and crosvm > > > > > > (and therefore have a clearer path toward being implemented in other VMMs)! > > > > > > > > > > Setting aside vhost-user for a moment, the DAX example given by Stefan > > > > > won't work in crosvm today. > > > > > > > > > > Is universal access to virtio shared memory regions actually mandated > > > > > by the virtio spec? Copying from virtiofs DAX to virtiofs sharing > > > > > seems reasonable enough, but what about virtio-pmem to virtio-blk? > > > > > What about screenshotting a framebuffer in virtio-gpu shared memory to > > > > > virtio-scsi? I guess with some plumbing in the VMM, it's solvable in a > > > > > virtualized environment. But what about when you have real hardware > > > > > that speaks virtio involved? That's outside my wheelhouse, but it > > > > > doesn't seem like that would be easy to solve. > > > > > > > > Yes, it can work for physical devices if allowed by host configuration. > > > > E.g. VFIO supports that I think. Don't think VDPA does. > > > > > > I'm sure it can work, but that sounds more like a SHOULD (MAY?), > > > rather than a MUST. > > > > > > > > For what it's worth, my interpretation of the target scenario: > > > > > > > > > > > Other backends don't see these mappings. If the guest submits a vring > > > > > > descriptor referencing a mapping to another backend, then that backend > > > > > > won't be able to access this memory > > > > > > > > > > is that it's omitting how the implementation is reconciled with > > > > > section 2.10.1 of v1.3 of the virtio spec, which states that: > > > > > > > > > > > References into shared memory regions are represented as offsets from > > > > > > the beginning of the region instead of absolute memory addresses. Offsets > > > > > > are used both for references between structures stored within shared > > > > > > memory and for requests placed in virtqueues that refer to shared memory. > > > > > > > > > > My interpretation of that statement is that putting raw guest physical > > > > > addresses corresponding to virtio shared memory regions into a vring > > > > > is a driver spec violation. > > > > > > > > > > -David > > > > > > > > This really applies within device I think. Should be clarified ... > > > > > > You mean that a virtio device can use absolute memory addresses for > > > other devices' shared memory regions, but it can't use absolute memory > > > addresses for its own shared memory regions? That's a rather strange > > > requirement. Or is the statement simply giving an addressing strategy > > > that device type specifications are free to ignore? > > > > My recollection of the intent behind the quoted section is: > > > > 1. Structures in shared memory that point to shared memory must used > > relative offsets instead of absolute physical addresses. > > 2. Virtqueue requests that refer to shared memory (e.g. map this page > > from virtiofs file to this location in shared memory) must use > > relative offsets instead of absolute physical addresses. > > > > In other words, shared memory must be relocatable. Don't assume Shared > > Memory Regions have an absolute guest physical address. This makes > > device implementations independent of the guest physical memory layout > > and might also help when Shared Memory Regions are exposed to guest > > user-space where the guest physical memory layout isn't known. > > Doesn't this discussion contradict the necessity of point 1? If I'm > understanding things correctly, it is valid for virtio device A to > refer to a structure in virtio device B's shared memory region by > absolute guest physical address. At least there is nothing in the spec > about resolving shmid values among different virtio devices, so > absolute guest physical addresses is the only way this sharing can be > done. And if it's valid for a pointer to a structure in a shared > memory region to exist, it's not clear to me why you can't have > pointers within a shared memory region. The reason is that VIRTIO has a layered design where the transport and vring layout deal with bus addresses but device types generally do not (except for specific exceptions like memory ballooning, etc). A device's virtqueue requests do not contain addresses (e.g. struct virtio_net_hdr). The virtqueue interface hides the details of memory organization and access. In theory a transport could be implemented over a medium that doesn't even offer shared memory (I think people have played with remote VIRTIO over TCP) and this is possible because device types don't know how virtqueue elements are represented. This same design constraint extends to VIRTIO Shared Memory Regions because a VIRTIO Shared Memory Region's contents are defined by the device type, just like virtqueue requests.. I mentioned that it avoids address translation in device type implementations and also makes it easy to expose VIRTIO Shared Memory Regions to guest userspace. (Similarly, putting addresses into the VIRTIO Configuration Space is also problematic because it exposes details of memory to the device type. They should be hidden by the VIRTIO transport.) That explains the intention within the VIRTIO world. The question you raised was why you're allowed to then pass the address of a VIRTIO Shared Memory Region to another device instead of passing a <shmid, offset> pair. The answer is because DMA is beyond the scope of the VIRTIO spec. If the architecture allows you to expose a buffer that happens to be located in a VIRTIO Shared Memory Region to another device, then it's possible to pass that address. The other device may not even be a VIRTIO device. It just performs a DMA transaction to read/write that memory. This is happening at another layer and it's a valid thing to do. So the answer is that in terms of designing VIRTIO device types, VIRTIO Shared Memory Region structure layouts or virtqueue request structs referring to VIRTIO Shared Memory Regions must not use addresses. But you may be able to pass the address of a VIRTIO Shared Memory Region to another device for DMA. They don't conflict because they are at different levels. > It definitely makes sense that setting up a mapping should be done > with offsets. But unless a shared memory region can be dynamically > reallocated at runtime, then it doesn't seem necessary to ban pointers > within a shared memory region. On a PCI device the BAR containing the VIRTIO Shared Memory Region can be remapped at runtime, so the shared memory region can move. The reason is the same as for why device types don't use address inside virtqueue request structures: the details of memory are hidden by the VIRTIO transport and the device type doesn't deal in addresses. Also, it makes mmapping a VIRTIO Shared Memory Region difficult in guest userspace because now some kind of address translation mechanism is necessary (i.e. IOMMU) so that the userspace application (which doesn't know about physical memory addresses) and the device implementation can translate memory. Just using offsets avoids this problem. Stefan
On Fri, 6 Sept 2024 at 03:06, Albert Esteve <aesteve@redhat.com> wrote: > On Thu, Sep 5, 2024 at 6:39 PM Stefan Hajnoczi <stefanha@redhat.com> wrote: >> >> On Tue, Sep 03, 2024 at 10:42:34AM +0200, Albert Esteve wrote: >> > Hello all, >> > >> > Sorry, I have been a bit disconnected from this thread as I was on >> > vacations and then had to switch tasks for a while. >> > >> > I will try to go through all comments and address them for the first >> > non-RFC drop of this patch series. >> > >> > But I was discussing with some colleagues on this. So turns out rust-vmm's >> > vhost-user-gpu will potentially use >> > this soon, and a rust-vmm/vhost patch have been already posted: >> > https://github.com/rust-vmm/vhost/pull/251. >> > So I think it may make sense to: >> > 1. Split the vhost-user documentation patch once settled. Since it is taken >> > as the official spec, >> > having it upstreamed independently of the implementation will benefit >> > other projects to >> > work/integrate their own code. >> > 2. Split READ_/WRITE_MEM messages from SHMEM_MAP/_UNMAP patches. >> > If I remember correctly, this addresses a virtio-fs specific issue, >> > that will not >> > impact either virtio-gpu nor virtio-media, or any other. >> >> This is an architectural issue that arises from exposing VIRTIO Shared >> Memory Regions in vhost-user. It was first seen with Linux virtiofs but >> it could happen with other devices and/or guest operating systems. >> >> Any VIRTIO Shared Memory Region that can be mmapped into Linux userspace >> may trigger this issue. Userspace may write(2) to an O_DIRECT file with >> the mmap as the source. The vhost-user-blk device will not be able to >> access the source device's VIRTIO Shared Memory Region and will fail. >> >> > So it may make >> > sense >> > to separate them so that one does not stall the other. I will try to >> > have both >> > integrated in the mid term. >> >> If READ_/WRITE_MEM is a pain to implement (I think it is in the >> vhost-user back-end, even though I've been a proponent of it), then >> another way to deal with this issue is to specify that upon receiving >> MAP/UNMAP messages, the vhost-user front-end must update the vhost-user >> memory tables of all other vhost-user devices. That way vhost-user >> devices will be able to access VIRTIO Shared Memory Regions mapped by >> other devices. >> >> Implementing this in QEMU should be much easier than implementing >> READ_/WRITE_MEM support in device back-ends. >> >> This will be slow and scale poorly but performance is only a problem for >> devices that frequently MAP/UNMAP like virtiofs. Will virtio-gpu and >> virtio-media use MAP/UNMAP often at runtime? They might be able to get >> away with this simple solution. >> >> I'd be happy with that. If someone wants to make virtiofs DAX faster, >> they can implement READ/WRITE_MEM or another solution later, but let's >> at least make things correct from the start. > > > I agree. I want it to be correct first. If you agree on splitting the spec bits from this > patch I'm already happy. I suggested splitting READ_/WRITE_MEM messages > because I thought that it was a virtiofs-specific issue. > > The alternative that you proposed is interesting. I'll take it into account. But I > feel I prefer to go for the better solution, and if I get too entangled, then switch > to the easier implementation. Great. The difficult part to implementing READ_/WRITE_MEM messages is modifying libvhost-user and rust-vmm's vhost crate to send the new messages when address translation fails. This needs to cover all memory accesses (including vring struct accesses). That code may be a few levels down in the call stack and assume it can always load/store directly from mmapped memory. > > I think we could do this in 2 patches: > 1. Split the documentation bits for SHMEM_MAP/_UNMAP. The > implementation for these messages will go into the second patch. > 2. The implementation patch: keep going for the time being with > READ_/WRITE_MEM support. And the documentation for that > is kept it within this patch. This way if we switch to the frontend > updating vhost-user memory table, we weren't set in any specific > solution if patch 1 has been already merged. I'm happy as long as the vhost-user spec patch that introduces MAP/UNMAP also covers a solution for the memory access problem (either READ_/WRITE_MEM or propagating mappings to all vhost-user back-ends). Stefan > > BR, > Albert. > >> >> >> Stefan >> >> > >> > WDYT? >> > >> > BR, >> > Albert. >> > >> > On Tue, Jul 16, 2024 at 3:21 AM David Stevens <stevensd@chromium.org> wrote: >> > >> > > On Fri, Jul 12, 2024 at 2:47 PM Michael S. Tsirkin <mst@redhat.com> wrote: >> > > > >> > > > On Fri, Jul 12, 2024 at 11:06:49AM +0900, David Stevens wrote: >> > > > > On Thu, Jul 11, 2024 at 7:56 PM Alyssa Ross <hi@alyssa.is> wrote: >> > > > > > >> > > > > > Adding David Stevens, who implemented SHMEM_MAP and SHMEM_UNMAP in >> > > > > > crosvm a couple of years ago. >> > > > > > >> > > > > > David, I'd be particularly interested for your thoughts on the >> > > MEM_READ >> > > > > > and MEM_WRITE commands, since as far as I know crosvm doesn't >> > > implement >> > > > > > anything like that. The discussion leading to those being added >> > > starts >> > > > > > here: >> > > > > > >> > > > > > >> > > https://lore.kernel.org/qemu-devel/20240604185416.GB90471@fedora.redhat.com/ >> > > > > > >> > > > > > It would be great if this could be standardised between QEMU and >> > > crosvm >> > > > > > (and therefore have a clearer path toward being implemented in other >> > > VMMs)! >> > > > > >> > > > > Setting aside vhost-user for a moment, the DAX example given by Stefan >> > > > > won't work in crosvm today. >> > > > > >> > > > > Is universal access to virtio shared memory regions actually mandated >> > > > > by the virtio spec? Copying from virtiofs DAX to virtiofs sharing >> > > > > seems reasonable enough, but what about virtio-pmem to virtio-blk? >> > > > > What about screenshotting a framebuffer in virtio-gpu shared memory to >> > > > > virtio-scsi? I guess with some plumbing in the VMM, it's solvable in a >> > > > > virtualized environment. But what about when you have real hardware >> > > > > that speaks virtio involved? That's outside my wheelhouse, but it >> > > > > doesn't seem like that would be easy to solve. >> > > > >> > > > Yes, it can work for physical devices if allowed by host configuration. >> > > > E.g. VFIO supports that I think. Don't think VDPA does. >> > > >> > > I'm sure it can work, but that sounds more like a SHOULD (MAY?), >> > > rather than a MUST. >> > > >> > > > > For what it's worth, my interpretation of the target scenario: >> > > > > >> > > > > > Other backends don't see these mappings. If the guest submits a vring >> > > > > > descriptor referencing a mapping to another backend, then that >> > > backend >> > > > > > won't be able to access this memory >> > > > > >> > > > > is that it's omitting how the implementation is reconciled with >> > > > > section 2.10.1 of v1.3 of the virtio spec, which states that: >> > > > > >> > > > > > References into shared memory regions are represented as offsets from >> > > > > > the beginning of the region instead of absolute memory addresses. >> > > Offsets >> > > > > > are used both for references between structures stored within shared >> > > > > > memory and for requests placed in virtqueues that refer to shared >> > > memory. >> > > > > >> > > > > My interpretation of that statement is that putting raw guest physical >> > > > > addresses corresponding to virtio shared memory regions into a vring >> > > > > is a driver spec violation. >> > > > > >> > > > > -David >> > > > >> > > > This really applies within device I think. Should be clarified ... >> > > >> > > You mean that a virtio device can use absolute memory addresses for >> > > other devices' shared memory regions, but it can't use absolute memory >> > > addresses for its own shared memory regions? That's a rather strange >> > > requirement. Or is the statement simply giving an addressing strategy >> > > that device type specifications are free to ignore? >> > > >> > > -David >> > > >> > >