Message ID | 20230421011223.718-13-gurchetansingh@chromium.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | gfxstream + rutabaga_gfx: a surprising delight or startling epiphany? | expand |
On 2023/04/21 10:12, Gurchetan Singh wrote: > I just copied the patches that have been floating around that do > this, but it doesn't seem to robustly work. This current > implementation is probably good enough to run vkcube or simple > apps, but whenever a test starts to aggressively map/unmap memory, > things do explode on the QEMU side. > > A simple way to reproduce is run: > > ./deqp-vk --deqp-case=deqp-vk --deqp-case=dEQP-VK.memory.mapping.suballocation.* > > You should get stack traces that sometimes look like this: > > 0 __pthread_kill_implementation (no_tid=0, signo=6, threadid=140737316304448) at ./nptl/pthread_kill.c:44 > 1 __pthread_kill_internal (signo=6, threadid=140737316304448) at ./nptl/pthread_kill.c:78 > 2 __GI___pthread_kill (threadid=140737316304448, signo=signo@entry=6) at ./nptl/pthread_kill.c:89 > 3 0x00007ffff7042476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26 > 4 0x00007ffff70287f3 in __GI_abort () at ./stdlib/abort.c:79 > 5 0x00007ffff70896f6 in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0x7ffff71dbb8c "%s\n") at ../sysdeps/posix/libc_fatal.c:155 > 6 0x00007ffff70a0d7c in malloc_printerr (str=str@entry=0x7ffff71de7b0 "double free or corruption (out)") at ./malloc/malloc.c:5664 > 7 0x00007ffff70a2ef0 in _int_free (av=0x7ffff7219c80 <main_arena>, p=0x555557793e00, have_lock=<optimized out>) at ./malloc/malloc.c:4588 > 8 0x00007ffff70a54d3 in __GI___libc_free (mem=<optimized out>) at ./malloc/malloc.c:3391 > 9 0x0000555555d65e7e in phys_section_destroy (mr=0x555557793e10) at ../softmmu/physmem.c:1003 > 10 0x0000555555d65ed0 in phys_sections_free (map=0x555556d4b410) at ../softmmu/physmem.c:1011 > 11 0x0000555555d69578 in address_space_dispatch_free (d=0x555556d4b400) at ../softmmu/physmem.c:2430 > 12 0x0000555555d58412 in flatview_destroy (view=0x5555572bb090) at ../softmmu/memory.c:292 > 13 0x000055555600fd23 in call_rcu_thread (opaque=0x0) at ../util/rcu.c:284 > 14 0x00005555560026d4 in qemu_thread_start (args=0x5555569cafa0) at ../util/qemu-thread-posix.c:541 > 15 0x00007ffff7094b43 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442 > 16 0x00007ffff7126a00 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81 > > or this: > > 0x0000555555e1dc80 in object_unref (objptr=0x6d656d3c6b6e696c) at ../qom/object.c:1198 > 1198 g_assert(obj->ref > 0); > (gdb) bt > 0 0x0000555555e1dc80 in object_unref (objptr=0x6d656d3c6b6e696c) at ../qom/object.c:1198 > 1 0x0000555555d5cca5 in memory_region_unref (mr=0x5555572b9e20) at ../softmmu/memory.c:1799 > 2 0x0000555555d65e47 in phys_section_destroy (mr=0x5555572b9e20) at ../softmmu/physmem.c:998 > 3 0x0000555555d65ec7 in phys_sections_free (map=0x5555588365c0) at ../softmmu/physmem.c:1011 > 4 0x0000555555d6956f in address_space_dispatch_free (d=0x5555588365b0) at ../softmmu/physmem.c:2430 > 5 0x0000555555d58409 in flatview_destroy (view=0x555558836570) at ../softmmu/memory.c:292 > 6 0x000055555600fd1a in call_rcu_thread (opaque=0x0) at ../util/rcu.c:284 > 7 0x00005555560026cb in qemu_thread_start (args=0x5555569cafa0) at ../util/qemu-thread-posix.c:541 > 8 0x00007ffff7094b43 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442 > 9 0x00007ffff7126a00 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81 > > The reason seems to be memory regions are handled on a different > thread than the virtio-gpu thread, and that inevitably leads to > raciness. The memory region docs[a] generally seems to dissuade this: > > "In order to do this, as a general rule do not create or destroy > memory regions dynamically during a device’s lifetime, and only > call object_unparent() in the memory region owner’s instance_finalize > callback. The dynamically allocated data structure that contains > the memory region then should obviously be freed in the > instance_finalize callback as well." > > Though instance_finalize is called before device destruction, so > storing the memory until then is unlikely to be an option. The > tests do pass when virtio-gpu doesn't free the memory, but > progressively the guest becomes slower and then OOMs. > > Though the api does make an exception: > > "There is an exception to the above rule: it is okay to call > object_unparent at any time for an alias or a container region. It is > therefore also okay to create or destroy alias and container regions > dynamically during a device’s lifetime." > > I believe we are trying to create a container subregion, but that's > still failing? Are we doing it right? Any memory region experts > here can help out? The other revelant patch in this series > is "virtio-gpu: hostmem". Perhaps dma_memory_map() is what you want? > > [a] https://qemu.readthedocs.io/en/latest/devel/memory.html > > Signed-off-by: Gurchetan Singh <gurchetansingh@chromium.org> > --- > hw/display/virtio-gpu-rutabaga.c | 14 ++++++++++++++ > 1 file changed, 14 insertions(+) > > diff --git a/hw/display/virtio-gpu-rutabaga.c b/hw/display/virtio-gpu-rutabaga.c > index 5fd1154198..196267aac2 100644 > --- a/hw/display/virtio-gpu-rutabaga.c > +++ b/hw/display/virtio-gpu-rutabaga.c > @@ -159,6 +159,12 @@ static int32_t rutabaga_handle_unmap(VirtIOGPU *g, > GET_VIRTIO_GPU_GL(g); > GET_RUTABAGA(virtio_gpu); > > + memory_region_transaction_begin(); > + memory_region_set_enabled(&res->region, false); > + memory_region_del_subregion(&g->parent_obj.hostmem, &res->region); > + object_unparent(OBJECT(&res->region)); > + memory_region_transaction_commit(); > + > res->mapped = NULL; > return rutabaga_resource_unmap(rutabaga, res->resource_id); > } > @@ -671,6 +677,14 @@ rutabaga_cmd_resource_map_blob(VirtIOGPU *g, > result = rutabaga_resource_map(rutabaga, mblob.resource_id, &mapping); > CHECK_RESULT(result, cmd); > > + memory_region_transaction_begin(); > + memory_region_init_ram_device_ptr(&res->region, OBJECT(g), NULL, > + mapping.size, (void *)mapping.ptr); > + memory_region_add_subregion(&g->parent_obj.hostmem, mblob.offset, > + &res->region); > + memory_region_set_enabled(&res->region, true); > + memory_region_transaction_commit(); > + > memset(&resp, 0, sizeof(resp)); > resp.hdr.type = VIRTIO_GPU_RESP_OK_MAP_INFO; > result = rutabaga_resource_map_info(rutabaga, mblob.resource_id,
On Fri, 21 Apr 2023 at 02:13, Gurchetan Singh <gurchetansingh@chromium.org> wrote: > Though the api does make an exception: > > "There is an exception to the above rule: it is okay to call > object_unparent at any time for an alias or a container region. It is > therefore also okay to create or destroy alias and container regions > dynamically during a device’s lifetime." > > I believe we are trying to create a container subregion, but that's > still failing? > @@ -671,6 +677,14 @@ rutabaga_cmd_resource_map_blob(VirtIOGPU *g, > result = rutabaga_resource_map(rutabaga, mblob.resource_id, &mapping); > CHECK_RESULT(result, cmd); > > + memory_region_transaction_begin(); > + memory_region_init_ram_device_ptr(&res->region, OBJECT(g), NULL, > + mapping.size, (void *)mapping.ptr); This isn't a container MemoryRegion -- it is a RAM MR. That is, accesses to it are backed by a lump of host memory (viz, the one provided here via the mapping.ptr). A container MR is one which provides no backing mechanism (neither host RAM, nor MMIO read/write callbacks), and whose contents are purely those of any other MemoryRegions that you add to it via memory_region_add_subregion(). So the exception listed in the API docs does not apply here. -- PMM
On Sat, Apr 22, 2023 at 8:46 AM Akihiko Odaki <akihiko.odaki@gmail.com> wrote: > > On 2023/04/21 10:12, Gurchetan Singh wrote: > > I just copied the patches that have been floating around that do > > this, but it doesn't seem to robustly work. This current > > implementation is probably good enough to run vkcube or simple > > apps, but whenever a test starts to aggressively map/unmap memory, > > things do explode on the QEMU side. > > > > A simple way to reproduce is run: > > > > ./deqp-vk --deqp-case=deqp-vk --deqp-case=dEQP-VK.memory.mapping.suballocation.* > > > > You should get stack traces that sometimes look like this: > > > > 0 __pthread_kill_implementation (no_tid=0, signo=6, threadid=140737316304448) at ./nptl/pthread_kill.c:44 > > 1 __pthread_kill_internal (signo=6, threadid=140737316304448) at ./nptl/pthread_kill.c:78 > > 2 __GI___pthread_kill (threadid=140737316304448, signo=signo@entry=6) at ./nptl/pthread_kill.c:89 > > 3 0x00007ffff7042476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26 > > 4 0x00007ffff70287f3 in __GI_abort () at ./stdlib/abort.c:79 > > 5 0x00007ffff70896f6 in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0x7ffff71dbb8c "%s\n") at ../sysdeps/posix/libc_fatal.c:155 > > 6 0x00007ffff70a0d7c in malloc_printerr (str=str@entry=0x7ffff71de7b0 "double free or corruption (out)") at ./malloc/malloc.c:5664 > > 7 0x00007ffff70a2ef0 in _int_free (av=0x7ffff7219c80 <main_arena>, p=0x555557793e00, have_lock=<optimized out>) at ./malloc/malloc.c:4588 > > 8 0x00007ffff70a54d3 in __GI___libc_free (mem=<optimized out>) at ./malloc/malloc.c:3391 > > 9 0x0000555555d65e7e in phys_section_destroy (mr=0x555557793e10) at ../softmmu/physmem.c:1003 > > 10 0x0000555555d65ed0 in phys_sections_free (map=0x555556d4b410) at ../softmmu/physmem.c:1011 > > 11 0x0000555555d69578 in address_space_dispatch_free (d=0x555556d4b400) at ../softmmu/physmem.c:2430 > > 12 0x0000555555d58412 in flatview_destroy (view=0x5555572bb090) at ../softmmu/memory.c:292 > > 13 0x000055555600fd23 in call_rcu_thread (opaque=0x0) at ../util/rcu.c:284 > > 14 0x00005555560026d4 in qemu_thread_start (args=0x5555569cafa0) at ../util/qemu-thread-posix.c:541 > > 15 0x00007ffff7094b43 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442 > > 16 0x00007ffff7126a00 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81 > > > > or this: > > > > 0x0000555555e1dc80 in object_unref (objptr=0x6d656d3c6b6e696c) at ../qom/object.c:1198 > > 1198 g_assert(obj->ref > 0); > > (gdb) bt > > 0 0x0000555555e1dc80 in object_unref (objptr=0x6d656d3c6b6e696c) at ../qom/object.c:1198 > > 1 0x0000555555d5cca5 in memory_region_unref (mr=0x5555572b9e20) at ../softmmu/memory.c:1799 > > 2 0x0000555555d65e47 in phys_section_destroy (mr=0x5555572b9e20) at ../softmmu/physmem.c:998 > > 3 0x0000555555d65ec7 in phys_sections_free (map=0x5555588365c0) at ../softmmu/physmem.c:1011 > > 4 0x0000555555d6956f in address_space_dispatch_free (d=0x5555588365b0) at ../softmmu/physmem.c:2430 > > 5 0x0000555555d58409 in flatview_destroy (view=0x555558836570) at ../softmmu/memory.c:292 > > 6 0x000055555600fd1a in call_rcu_thread (opaque=0x0) at ../util/rcu.c:284 > > 7 0x00005555560026cb in qemu_thread_start (args=0x5555569cafa0) at ../util/qemu-thread-posix.c:541 > > 8 0x00007ffff7094b43 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442 > > 9 0x00007ffff7126a00 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81 > > > > The reason seems to be memory regions are handled on a different > > thread than the virtio-gpu thread, and that inevitably leads to > > raciness. The memory region docs[a] generally seems to dissuade this: > > > > "In order to do this, as a general rule do not create or destroy > > memory regions dynamically during a device’s lifetime, and only > > call object_unparent() in the memory region owner’s instance_finalize > > callback. The dynamically allocated data structure that contains > > the memory region then should obviously be freed in the > > instance_finalize callback as well." > > > > Though instance_finalize is called before device destruction, so > > storing the memory until then is unlikely to be an option. The > > tests do pass when virtio-gpu doesn't free the memory, but > > progressively the guest becomes slower and then OOMs. > > > > Though the api does make an exception: > > > > "There is an exception to the above rule: it is okay to call > > object_unparent at any time for an alias or a container region. It is > > therefore also okay to create or destroy alias and container regions > > dynamically during a device’s lifetime." > > > > I believe we are trying to create a container subregion, but that's > > still failing? Are we doing it right? Any memory region experts > > here can help out? The other revelant patch in this series > > is "virtio-gpu: hostmem". > > Perhaps dma_memory_map() is what you want? It looks like dma_memory_map(..) gives a host virtual address given an guest DMA address? If so, that won't work. We need to give the guest a pointer to the host's GPU memory. I took a look at what the Android Emulator has been doing: it largely bypasses the MemoryRegion API and defines generic map/unmap functions: https://android.googlesource.com/platform/external/qemu/+/emu-master-dev/hw/display/virtio-gpu-3d.c#473 https://android.googlesource.com/platform/external/qemu/+/emu-master-dev/exec.c#2472 https://android.googlesource.com/platform/external/qemu/+/emu-master-dev/accel/kvm/kvm-all.c#1787 This is kind of what crosvm does too. That seems the simplest way to do what we need. Any objections to this approach or alternative methods? > > > > > [a] https://qemu.readthedocs.io/en/latest/devel/memory.html > > > > Signed-off-by: Gurchetan Singh <gurchetansingh@chromium.org> > > --- > > hw/display/virtio-gpu-rutabaga.c | 14 ++++++++++++++ > > 1 file changed, 14 insertions(+) > > > > diff --git a/hw/display/virtio-gpu-rutabaga.c b/hw/display/virtio-gpu-rutabaga.c > > index 5fd1154198..196267aac2 100644 > > --- a/hw/display/virtio-gpu-rutabaga.c > > +++ b/hw/display/virtio-gpu-rutabaga.c > > @@ -159,6 +159,12 @@ static int32_t rutabaga_handle_unmap(VirtIOGPU *g, > > GET_VIRTIO_GPU_GL(g); > > GET_RUTABAGA(virtio_gpu); > > > > + memory_region_transaction_begin(); > > + memory_region_set_enabled(&res->region, false); > > + memory_region_del_subregion(&g->parent_obj.hostmem, &res->region); > > + object_unparent(OBJECT(&res->region)); > > + memory_region_transaction_commit(); > > + > > res->mapped = NULL; > > return rutabaga_resource_unmap(rutabaga, res->resource_id); > > } > > @@ -671,6 +677,14 @@ rutabaga_cmd_resource_map_blob(VirtIOGPU *g, > > result = rutabaga_resource_map(rutabaga, mblob.resource_id, &mapping); > > CHECK_RESULT(result, cmd); > > > > + memory_region_transaction_begin(); > > + memory_region_init_ram_device_ptr(&res->region, OBJECT(g), NULL, > > + mapping.size, (void *)mapping.ptr); > > + memory_region_add_subregion(&g->parent_obj.hostmem, mblob.offset, > > + &res->region); > > + memory_region_set_enabled(&res->region, true); > > + memory_region_transaction_commit(); > > + > > memset(&resp, 0, sizeof(resp)); > > resp.hdr.type = VIRTIO_GPU_RESP_OK_MAP_INFO; > > result = rutabaga_resource_map_info(rutabaga, mblob.resource_id,
diff --git a/hw/display/virtio-gpu-rutabaga.c b/hw/display/virtio-gpu-rutabaga.c index 5fd1154198..196267aac2 100644 --- a/hw/display/virtio-gpu-rutabaga.c +++ b/hw/display/virtio-gpu-rutabaga.c @@ -159,6 +159,12 @@ static int32_t rutabaga_handle_unmap(VirtIOGPU *g, GET_VIRTIO_GPU_GL(g); GET_RUTABAGA(virtio_gpu); + memory_region_transaction_begin(); + memory_region_set_enabled(&res->region, false); + memory_region_del_subregion(&g->parent_obj.hostmem, &res->region); + object_unparent(OBJECT(&res->region)); + memory_region_transaction_commit(); + res->mapped = NULL; return rutabaga_resource_unmap(rutabaga, res->resource_id); } @@ -671,6 +677,14 @@ rutabaga_cmd_resource_map_blob(VirtIOGPU *g, result = rutabaga_resource_map(rutabaga, mblob.resource_id, &mapping); CHECK_RESULT(result, cmd); + memory_region_transaction_begin(); + memory_region_init_ram_device_ptr(&res->region, OBJECT(g), NULL, + mapping.size, (void *)mapping.ptr); + memory_region_add_subregion(&g->parent_obj.hostmem, mblob.offset, + &res->region); + memory_region_set_enabled(&res->region, true); + memory_region_transaction_commit(); + memset(&resp, 0, sizeof(resp)); resp.hdr.type = VIRTIO_GPU_RESP_OK_MAP_INFO; result = rutabaga_resource_map_info(rutabaga, mblob.resource_id,
I just copied the patches that have been floating around that do this, but it doesn't seem to robustly work. This current implementation is probably good enough to run vkcube or simple apps, but whenever a test starts to aggressively map/unmap memory, things do explode on the QEMU side. A simple way to reproduce is run: ./deqp-vk --deqp-case=deqp-vk --deqp-case=dEQP-VK.memory.mapping.suballocation.* You should get stack traces that sometimes look like this: 0 __pthread_kill_implementation (no_tid=0, signo=6, threadid=140737316304448) at ./nptl/pthread_kill.c:44 1 __pthread_kill_internal (signo=6, threadid=140737316304448) at ./nptl/pthread_kill.c:78 2 __GI___pthread_kill (threadid=140737316304448, signo=signo@entry=6) at ./nptl/pthread_kill.c:89 3 0x00007ffff7042476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26 4 0x00007ffff70287f3 in __GI_abort () at ./stdlib/abort.c:79 5 0x00007ffff70896f6 in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0x7ffff71dbb8c "%s\n") at ../sysdeps/posix/libc_fatal.c:155 6 0x00007ffff70a0d7c in malloc_printerr (str=str@entry=0x7ffff71de7b0 "double free or corruption (out)") at ./malloc/malloc.c:5664 7 0x00007ffff70a2ef0 in _int_free (av=0x7ffff7219c80 <main_arena>, p=0x555557793e00, have_lock=<optimized out>) at ./malloc/malloc.c:4588 8 0x00007ffff70a54d3 in __GI___libc_free (mem=<optimized out>) at ./malloc/malloc.c:3391 9 0x0000555555d65e7e in phys_section_destroy (mr=0x555557793e10) at ../softmmu/physmem.c:1003 10 0x0000555555d65ed0 in phys_sections_free (map=0x555556d4b410) at ../softmmu/physmem.c:1011 11 0x0000555555d69578 in address_space_dispatch_free (d=0x555556d4b400) at ../softmmu/physmem.c:2430 12 0x0000555555d58412 in flatview_destroy (view=0x5555572bb090) at ../softmmu/memory.c:292 13 0x000055555600fd23 in call_rcu_thread (opaque=0x0) at ../util/rcu.c:284 14 0x00005555560026d4 in qemu_thread_start (args=0x5555569cafa0) at ../util/qemu-thread-posix.c:541 15 0x00007ffff7094b43 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442 16 0x00007ffff7126a00 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81 or this: 0x0000555555e1dc80 in object_unref (objptr=0x6d656d3c6b6e696c) at ../qom/object.c:1198 1198 g_assert(obj->ref > 0); (gdb) bt 0 0x0000555555e1dc80 in object_unref (objptr=0x6d656d3c6b6e696c) at ../qom/object.c:1198 1 0x0000555555d5cca5 in memory_region_unref (mr=0x5555572b9e20) at ../softmmu/memory.c:1799 2 0x0000555555d65e47 in phys_section_destroy (mr=0x5555572b9e20) at ../softmmu/physmem.c:998 3 0x0000555555d65ec7 in phys_sections_free (map=0x5555588365c0) at ../softmmu/physmem.c:1011 4 0x0000555555d6956f in address_space_dispatch_free (d=0x5555588365b0) at ../softmmu/physmem.c:2430 5 0x0000555555d58409 in flatview_destroy (view=0x555558836570) at ../softmmu/memory.c:292 6 0x000055555600fd1a in call_rcu_thread (opaque=0x0) at ../util/rcu.c:284 7 0x00005555560026cb in qemu_thread_start (args=0x5555569cafa0) at ../util/qemu-thread-posix.c:541 8 0x00007ffff7094b43 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442 9 0x00007ffff7126a00 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81 The reason seems to be memory regions are handled on a different thread than the virtio-gpu thread, and that inevitably leads to raciness. The memory region docs[a] generally seems to dissuade this: "In order to do this, as a general rule do not create or destroy memory regions dynamically during a device’s lifetime, and only call object_unparent() in the memory region owner’s instance_finalize callback. The dynamically allocated data structure that contains the memory region then should obviously be freed in the instance_finalize callback as well." Though instance_finalize is called before device destruction, so storing the memory until then is unlikely to be an option. The tests do pass when virtio-gpu doesn't free the memory, but progressively the guest becomes slower and then OOMs. Though the api does make an exception: "There is an exception to the above rule: it is okay to call object_unparent at any time for an alias or a container region. It is therefore also okay to create or destroy alias and container regions dynamically during a device’s lifetime." I believe we are trying to create a container subregion, but that's still failing? Are we doing it right? Any memory region experts here can help out? The other revelant patch in this series is "virtio-gpu: hostmem". [a] https://qemu.readthedocs.io/en/latest/devel/memory.html Signed-off-by: Gurchetan Singh <gurchetansingh@chromium.org> --- hw/display/virtio-gpu-rutabaga.c | 14 ++++++++++++++ 1 file changed, 14 insertions(+)