Message ID | 20230401124257.24537-1-graf@amazon.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v4] hostmem-file: add offset option | expand |
On Sat, Apr 01, 2023 at 12:42:57PM +0000, Alexander Graf wrote: > Add an option for hostmem-file to start the memory object at an offset > into the target file. This is useful if multiple memory objects reside > inside the same target file, such as a device node. > > In particular, it's useful to map guest memory directly into /dev/mem > for experimentation. > > Signed-off-by: Alexander Graf <graf@amazon.com> > Reviewed-by: Stefan Hajnoczi <stefanha@gmail.com> > > --- > > v1 -> v2: > > - add qom documentation > - propagate offset into truncate, size and alignment checks > > v2 -> v3: > > - failed attempt at fixing typo > > v2 -> v4: > > - fix typo > --- > backends/hostmem-file.c | 40 +++++++++++++++++++++++++++++++++++++++- > include/exec/memory.h | 2 ++ > include/exec/ram_addr.h | 3 ++- > qapi/qom.json | 5 +++++ > qemu-options.hx | 6 +++++- > softmmu/memory.c | 3 ++- > softmmu/physmem.c | 14 ++++++++++---- > 7 files changed, 65 insertions(+), 8 deletions(-) Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
On 01.04.23 19:47, Stefan Hajnoczi wrote: > On Sat, Apr 01, 2023 at 12:42:57PM +0000, Alexander Graf wrote: >> Add an option for hostmem-file to start the memory object at an offset >> into the target file. This is useful if multiple memory objects reside >> inside the same target file, such as a device node. >> >> In particular, it's useful to map guest memory directly into /dev/mem >> for experimentation. >> >> Signed-off-by: Alexander Graf <graf@amazon.com> >> Reviewed-by: Stefan Hajnoczi <stefanha@gmail.com> >> >> --- >> >> v1 -> v2: >> >> - add qom documentation >> - propagate offset into truncate, size and alignment checks >> >> v2 -> v3: >> >> - failed attempt at fixing typo >> >> v2 -> v4: >> >> - fix typo >> --- >> backends/hostmem-file.c | 40 +++++++++++++++++++++++++++++++++++++++- >> include/exec/memory.h | 2 ++ >> include/exec/ram_addr.h | 3 ++- >> qapi/qom.json | 5 +++++ >> qemu-options.hx | 6 +++++- >> softmmu/memory.c | 3 ++- >> softmmu/physmem.c | 14 ++++++++++---- >> 7 files changed, 65 insertions(+), 8 deletions(-) > > Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> The change itself looks good to me, but I do think some other QEMU code that ends up working on the RAMBlock is not prepared yet. Most probably, because we never ended up using fd with an offset as guest RAM. We don't seem to be remembering that offset in the RAMBlock. First, I thought block->offset would be used for that, but that's just the offset in the ram_addr_t space. Maybe we need a new "block->fd_offset" to remember the offset (unless I am missing something). The real offset in the file would be required at least in two cases I can see (whenever we essentially end up calling mmap() on the fd again): 1) qemu_ram_remap(): We'd have to add the file offset on top of the calculated offset. 2) vhost-user: most probably whenever we set the mmap_offset. For example, in vhost_user_fill_set_mem_table_msg() we'd similarly have to add the file_offset on top of the calculated offset. vhost_user_get_mr_data() should most probably do that.
On Mon, Apr 03, 2023 at 09:13:29AM +0200, David Hildenbrand wrote: > On 01.04.23 19:47, Stefan Hajnoczi wrote: > > On Sat, Apr 01, 2023 at 12:42:57PM +0000, Alexander Graf wrote: > > > Add an option for hostmem-file to start the memory object at an offset > > > into the target file. This is useful if multiple memory objects reside > > > inside the same target file, such as a device node. > > > > > > In particular, it's useful to map guest memory directly into /dev/mem > > > for experimentation. > > > > > > Signed-off-by: Alexander Graf <graf@amazon.com> > > > Reviewed-by: Stefan Hajnoczi <stefanha@gmail.com> > > > > > > --- > > > > > > v1 -> v2: > > > > > > - add qom documentation > > > - propagate offset into truncate, size and alignment checks > > > > > > v2 -> v3: > > > > > > - failed attempt at fixing typo > > > > > > v2 -> v4: > > > > > > - fix typo > > > --- > > > backends/hostmem-file.c | 40 +++++++++++++++++++++++++++++++++++++++- > > > include/exec/memory.h | 2 ++ > > > include/exec/ram_addr.h | 3 ++- > > > qapi/qom.json | 5 +++++ > > > qemu-options.hx | 6 +++++- > > > softmmu/memory.c | 3 ++- > > > softmmu/physmem.c | 14 ++++++++++---- > > > 7 files changed, 65 insertions(+), 8 deletions(-) > > > > Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> > > The change itself looks good to me, but I do think some other QEMU code that > ends up working on the RAMBlock is not prepared yet. Most probably, because > we never ended up using fd with an offset as guest RAM. > > We don't seem to be remembering that offset in the RAMBlock. First, I > thought block->offset would be used for that, but that's just the offset in > the ram_addr_t space. Maybe we need a new "block->fd_offset" to remember the > offset (unless I am missing something). I think you're right. > > The real offset in the file would be required at least in two cases I can > see (whenever we essentially end up calling mmap() on the fd again): > > 1) qemu_ram_remap(): We'd have to add the file offset on top of the > calculated offset. > > 2) vhost-user: most probably whenever we set the mmap_offset. For example, > in vhost_user_fill_set_mem_table_msg() we'd similarly have to add the > file_offset on top of the calculated offset. vhost_user_get_mr_data() should > most probably do that. I had a patch to add that offset for the upcoming doublemap feature here: https://lore.kernel.org/all/20230117220914.2062125-8-peterx@redhat.com/ But that was because doublemap wants to map the guest mem twice for other purposes. I didn't yet notice that the code seem to be already broken if without offset==0. While, I _think_ we already have offset!=0 case for a ramblock, since: commit ed5d001916dd46ceed6d8850e453bcd7b5db2acb Author: Jagannathan Raman <jag.raman@oracle.com> Date: Fri Jan 29 11:46:13 2021 -0500 multi-process: setup memory manager for remote device Where there's: memory_region_init_ram_from_fd(subregion, NULL, name, sysmem_info->sizes[region], RAM_SHARED, msg->fds[region], sysmem_info->offsets[region], errp); Thanks,
On 03.04.23 17:49, Peter Xu wrote: > On Mon, Apr 03, 2023 at 09:13:29AM +0200, David Hildenbrand wrote: >> On 01.04.23 19:47, Stefan Hajnoczi wrote: >>> On Sat, Apr 01, 2023 at 12:42:57PM +0000, Alexander Graf wrote: >>>> Add an option for hostmem-file to start the memory object at an offset >>>> into the target file. This is useful if multiple memory objects reside >>>> inside the same target file, such as a device node. >>>> >>>> In particular, it's useful to map guest memory directly into /dev/mem >>>> for experimentation. >>>> >>>> Signed-off-by: Alexander Graf <graf@amazon.com> >>>> Reviewed-by: Stefan Hajnoczi <stefanha@gmail.com> >>>> >>>> --- >>>> >>>> v1 -> v2: >>>> >>>> - add qom documentation >>>> - propagate offset into truncate, size and alignment checks >>>> >>>> v2 -> v3: >>>> >>>> - failed attempt at fixing typo >>>> >>>> v2 -> v4: >>>> >>>> - fix typo >>>> --- >>>> backends/hostmem-file.c | 40 +++++++++++++++++++++++++++++++++++++++- >>>> include/exec/memory.h | 2 ++ >>>> include/exec/ram_addr.h | 3 ++- >>>> qapi/qom.json | 5 +++++ >>>> qemu-options.hx | 6 +++++- >>>> softmmu/memory.c | 3 ++- >>>> softmmu/physmem.c | 14 ++++++++++---- >>>> 7 files changed, 65 insertions(+), 8 deletions(-) >>> >>> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> >> >> The change itself looks good to me, but I do think some other QEMU code that >> ends up working on the RAMBlock is not prepared yet. Most probably, because >> we never ended up using fd with an offset as guest RAM. >> >> We don't seem to be remembering that offset in the RAMBlock. First, I >> thought block->offset would be used for that, but that's just the offset in >> the ram_addr_t space. Maybe we need a new "block->fd_offset" to remember the >> offset (unless I am missing something). > > I think you're right. > >> >> The real offset in the file would be required at least in two cases I can >> see (whenever we essentially end up calling mmap() on the fd again): >> >> 1) qemu_ram_remap(): We'd have to add the file offset on top of the >> calculated offset. >> >> 2) vhost-user: most probably whenever we set the mmap_offset. For example, >> in vhost_user_fill_set_mem_table_msg() we'd similarly have to add the >> file_offset on top of the calculated offset. vhost_user_get_mr_data() should >> most probably do that. > > I had a patch to add that offset for the upcoming doublemap feature here: > > https://lore.kernel.org/all/20230117220914.2062125-8-peterx@redhat.com/ > > But that was because doublemap wants to map the guest mem twice for other > purposes. I didn't yet notice that the code seem to be already broken if > without offset==0. > > While, I _think_ we already have offset!=0 case for a ramblock, since: > > commit ed5d001916dd46ceed6d8850e453bcd7b5db2acb > Author: Jagannathan Raman <jag.raman@oracle.com> > Date: Fri Jan 29 11:46:13 2021 -0500 > > multi-process: setup memory manager for remote device > > Where there's: > > memory_region_init_ram_from_fd(subregion, NULL, > name, sysmem_info->sizes[region], > RAM_SHARED, msg->fds[region], > sysmem_info->offsets[region], > errp); Interesting ... maybe so far never used alongside vhost-user.
On 03.04.23 09:13, David Hildenbrand wrote: > > On 01.04.23 19:47, Stefan Hajnoczi wrote: >> On Sat, Apr 01, 2023 at 12:42:57PM +0000, Alexander Graf wrote: >>> Add an option for hostmem-file to start the memory object at an offset >>> into the target file. This is useful if multiple memory objects reside >>> inside the same target file, such as a device node. >>> >>> In particular, it's useful to map guest memory directly into /dev/mem >>> for experimentation. >>> >>> Signed-off-by: Alexander Graf <graf@amazon.com> >>> Reviewed-by: Stefan Hajnoczi <stefanha@gmail.com> >>> >>> --- >>> >>> v1 -> v2: >>> >>> - add qom documentation >>> - propagate offset into truncate, size and alignment checks >>> >>> v2 -> v3: >>> >>> - failed attempt at fixing typo >>> >>> v2 -> v4: >>> >>> - fix typo >>> --- >>> backends/hostmem-file.c | 40 +++++++++++++++++++++++++++++++++++++++- >>> include/exec/memory.h | 2 ++ >>> include/exec/ram_addr.h | 3 ++- >>> qapi/qom.json | 5 +++++ >>> qemu-options.hx | 6 +++++- >>> softmmu/memory.c | 3 ++- >>> softmmu/physmem.c | 14 ++++++++++---- >>> 7 files changed, 65 insertions(+), 8 deletions(-) >> >> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> > > The change itself looks good to me, but I do think some other QEMU code > that ends up working on the RAMBlock is not prepared yet. Most probably, > because we never ended up using fd with an offset as guest RAM. > > We don't seem to be remembering that offset in the RAMBlock. First, I > thought block->offset would be used for that, but that's just the offset > in the ram_addr_t space. Maybe we need a new "block->fd_offset" to > remember the offset (unless I am missing something). > > The real offset in the file would be required at least in two cases I > can see (whenever we essentially end up calling mmap() on the fd again): > > 1) qemu_ram_remap(): We'd have to add the file offset on top of the > calculated offset. This one is a bit tricky to test, as we're only running into that code path with KVM when we see an #MCE. But it's trivial, so I'm confident it will work as expected. > > 2) vhost-user: most probably whenever we set the mmap_offset. For > example, in vhost_user_fill_set_mem_table_msg() we'd similarly have to > add the file_offset on top of the calculated offset. > vhost_user_get_mr_data() should most probably do that. I agree - adding the offset as part of get_mr_data() is sufficient. I have validated it works correctly with QEMU's vhost-user-blk target. I think the changes are still obvious enough that I'll fold them all into a single patch. Alex Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B Sitz: Berlin Ust-ID: DE 289 237 879
>>> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> >> >> The change itself looks good to me, but I do think some other QEMU code >> that ends up working on the RAMBlock is not prepared yet. Most probably, >> because we never ended up using fd with an offset as guest RAM. >> >> We don't seem to be remembering that offset in the RAMBlock. First, I >> thought block->offset would be used for that, but that's just the offset >> in the ram_addr_t space. Maybe we need a new "block->fd_offset" to >> remember the offset (unless I am missing something). >> >> The real offset in the file would be required at least in two cases I >> can see (whenever we essentially end up calling mmap() on the fd again): >> >> 1) qemu_ram_remap(): We'd have to add the file offset on top of the >> calculated offset. > > > This one is a bit tricky to test, as we're only running into that code > path with KVM when we see an #MCE. But it's trivial, so I'm confident it > will work as expected. > Indeed. > >> >> 2) vhost-user: most probably whenever we set the mmap_offset. For >> example, in vhost_user_fill_set_mem_table_msg() we'd similarly have to >> add the file_offset on top of the calculated offset. >> vhost_user_get_mr_data() should most probably do that. > > > I agree - adding the offset as part of get_mr_data() is sufficient. I > have validated it works correctly with QEMU's vhost-user-blk target. > > I think the changes are still obvious enough that I'll fold them all > into a single patch. Most probably good enough. Having the offset part separately as a fix for ed5d001916 ("multi-process: setup memory manager for remote device") could be beneficial, though. Thanks Alex!
diff --git a/backends/hostmem-file.c b/backends/hostmem-file.c index 25141283c4..38ea65bec5 100644 --- a/backends/hostmem-file.c +++ b/backends/hostmem-file.c @@ -27,6 +27,7 @@ struct HostMemoryBackendFile { char *mem_path; uint64_t align; + uint64_t offset; bool discard_data; bool is_pmem; bool readonly; @@ -58,7 +59,8 @@ file_backend_memory_alloc(HostMemoryBackend *backend, Error **errp) ram_flags |= fb->is_pmem ? RAM_PMEM : 0; memory_region_init_ram_from_file(&backend->mr, OBJECT(backend), name, backend->size, fb->align, ram_flags, - fb->mem_path, fb->readonly, errp); + fb->mem_path, fb->offset, fb->readonly, + errp); g_free(name); #endif } @@ -125,6 +127,36 @@ static void file_memory_backend_set_align(Object *o, Visitor *v, fb->align = val; } +static void file_memory_backend_get_offset(Object *o, Visitor *v, + const char *name, void *opaque, + Error **errp) +{ + HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(o); + uint64_t val = fb->offset; + + visit_type_size(v, name, &val, errp); +} + +static void file_memory_backend_set_offset(Object *o, Visitor *v, + const char *name, void *opaque, + Error **errp) +{ + HostMemoryBackend *backend = MEMORY_BACKEND(o); + HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(o); + uint64_t val; + + if (host_memory_backend_mr_inited(backend)) { + error_setg(errp, "cannot change property '%s' of %s", name, + object_get_typename(o)); + return; + } + + if (!visit_type_size(v, name, &val, errp)) { + return; + } + fb->offset = val; +} + #ifdef CONFIG_LIBPMEM static bool file_memory_backend_get_pmem(Object *o, Error **errp) { @@ -197,6 +229,12 @@ file_backend_class_init(ObjectClass *oc, void *data) file_memory_backend_get_align, file_memory_backend_set_align, NULL, NULL); + object_class_property_add(oc, "offset", "int", + file_memory_backend_get_offset, + file_memory_backend_set_offset, + NULL, NULL); + object_class_property_set_description(oc, "offset", + "Offset into the target file (ex: 1G)"); #ifdef CONFIG_LIBPMEM object_class_property_add_bool(oc, "pmem", file_memory_backend_get_pmem, file_memory_backend_set_pmem); diff --git a/include/exec/memory.h b/include/exec/memory.h index 15ade918ba..3b7295fbe2 100644 --- a/include/exec/memory.h +++ b/include/exec/memory.h @@ -1318,6 +1318,7 @@ void memory_region_init_resizeable_ram(MemoryRegion *mr, * @ram_flags: RamBlock flags. Supported flags: RAM_SHARED, RAM_PMEM, * RAM_NORESERVE, * @path: the path in which to allocate the RAM. + * @offset: offset within the file referenced by path * @readonly: true to open @path for reading, false for read/write. * @errp: pointer to Error*, to store an error if it happens. * @@ -1331,6 +1332,7 @@ void memory_region_init_ram_from_file(MemoryRegion *mr, uint64_t align, uint32_t ram_flags, const char *path, + ram_addr_t offset, bool readonly, Error **errp); diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h index f4fb6a2111..90a8269290 100644 --- a/include/exec/ram_addr.h +++ b/include/exec/ram_addr.h @@ -110,6 +110,7 @@ long qemu_maxrampagesize(void); * @ram_flags: RamBlock flags. Supported flags: RAM_SHARED, RAM_PMEM, * RAM_NORESERVE. * @mem_path or @fd: specify the backing file or device + * @offset: Offset into target file * @readonly: true to open @path for reading, false for read/write. * @errp: pointer to Error*, to store an error if it happens * @@ -119,7 +120,7 @@ long qemu_maxrampagesize(void); */ RAMBlock *qemu_ram_alloc_from_file(ram_addr_t size, MemoryRegion *mr, uint32_t ram_flags, const char *mem_path, - bool readonly, Error **errp); + off_t offset, bool readonly, Error **errp); RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, MemoryRegion *mr, uint32_t ram_flags, int fd, off_t offset, bool readonly, Error **errp); diff --git a/qapi/qom.json b/qapi/qom.json index a877b879b9..bbb0664062 100644 --- a/qapi/qom.json +++ b/qapi/qom.json @@ -635,6 +635,10 @@ # specify the required alignment via this option. # 0 selects a default alignment (currently the page size). (default: 0) # +# @offset: the offset into the target file that the region starts at. You can +# use this option to overload multiple regions into a single file. +# (default: 0) +# # @discard-data: if true, the file contents can be destroyed when QEMU exits, # to avoid unnecessarily flushing data to the backing file. Note # that ``discard-data`` is only an optimization, and QEMU might @@ -655,6 +659,7 @@ { 'struct': 'MemoryBackendFileProperties', 'base': 'MemoryBackendProperties', 'data': { '*align': 'size', + '*offset': 'size', '*discard-data': 'bool', 'mem-path': 'str', '*pmem': { 'type': 'bool', 'if': 'CONFIG_LIBPMEM' }, diff --git a/qemu-options.hx b/qemu-options.hx index 59bdf67a2c..701cf39feb 100644 --- a/qemu-options.hx +++ b/qemu-options.hx @@ -4859,7 +4859,7 @@ SRST they are specified. Note that the 'id' property must be set. These objects are placed in the '/objects' path. - ``-object memory-backend-file,id=id,size=size,mem-path=dir,share=on|off,discard-data=on|off,merge=on|off,dump=on|off,prealloc=on|off,host-nodes=host-nodes,policy=default|preferred|bind|interleave,align=align,readonly=on|off`` + ``-object memory-backend-file,id=id,size=size,mem-path=dir,share=on|off,discard-data=on|off,merge=on|off,dump=on|off,prealloc=on|off,host-nodes=host-nodes,policy=default|preferred|bind|interleave,align=align,offset=offset,readonly=on|off`` Creates a memory file backend object, which can be used to back the guest RAM with huge pages. @@ -4929,6 +4929,10 @@ SRST such cases, users can specify the required alignment via this option. + The ``offset`` option specifies the offset into the target file + that the region starts at. You can use this parameter to overload + multiple regions into a single file. + The ``pmem`` option specifies whether the backing file specified by ``mem-path`` is in host persistent memory that can be accessed using the SNIA NVM programming model (e.g. Intel diff --git a/softmmu/memory.c b/softmmu/memory.c index 5305aca7ca..9f620085a0 100644 --- a/softmmu/memory.c +++ b/softmmu/memory.c @@ -1601,6 +1601,7 @@ void memory_region_init_ram_from_file(MemoryRegion *mr, uint64_t align, uint32_t ram_flags, const char *path, + ram_addr_t offset, bool readonly, Error **errp) { @@ -1612,7 +1613,7 @@ void memory_region_init_ram_from_file(MemoryRegion *mr, mr->destructor = memory_region_destructor_ram; mr->align = align; mr->ram_block = qemu_ram_alloc_from_file(size, mr, ram_flags, path, - readonly, &err); + offset, readonly, &err); if (err) { mr->size = int128_zero(); object_unparent(OBJECT(mr)); diff --git a/softmmu/physmem.c b/softmmu/physmem.c index e35061bba4..829c508b3b 100644 --- a/softmmu/physmem.c +++ b/softmmu/physmem.c @@ -1369,6 +1369,11 @@ static void *file_ram_alloc(RAMBlock *block, error_setg(errp, "alignment 0x%" PRIx64 " must be a power of two", block->mr->align); return NULL; + } else if (offset % block->page_size) { + error_setg(errp, "offset 0x%" PRIx64 + " must be multiples of page size 0x%zx", + offset, block->page_size); + return NULL; } block->mr->align = MAX(block->page_size, block->mr->align); #if defined(__s390x__) @@ -1400,7 +1405,7 @@ static void *file_ram_alloc(RAMBlock *block, * those labels. Therefore, extending the non-empty backend file * is disabled as well. */ - if (truncate && ftruncate(fd, memory)) { + if (truncate && ftruncate(fd, offset + memory)) { perror("ftruncate"); } @@ -1889,7 +1894,7 @@ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, MemoryRegion *mr, size = HOST_PAGE_ALIGN(size); file_size = get_file_size(fd); - if (file_size > 0 && file_size < size) { + if (file_size > offset && file_size < (offset + size)) { error_setg(errp, "backing store size 0x%" PRIx64 " does not match 'size' option 0x" RAM_ADDR_FMT, file_size, size); @@ -1929,7 +1934,7 @@ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, MemoryRegion *mr, RAMBlock *qemu_ram_alloc_from_file(ram_addr_t size, MemoryRegion *mr, uint32_t ram_flags, const char *mem_path, - bool readonly, Error **errp) + off_t offset, bool readonly, Error **errp) { int fd; bool created; @@ -1941,7 +1946,8 @@ RAMBlock *qemu_ram_alloc_from_file(ram_addr_t size, MemoryRegion *mr, return NULL; } - block = qemu_ram_alloc_from_fd(size, mr, ram_flags, fd, 0, readonly, errp); + block = qemu_ram_alloc_from_fd(size, mr, ram_flags, fd, offset, readonly, + errp); if (!block) { if (created) { unlink(mem_path);