[v6,00/10] Support virtio-gpu DRM native context

Message ID	20250126201121.470990-1-dmitry.osipenko@collabora.com (mailing list archive)
Headers	show Return-Path: <qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org> From: Dmitry Osipenko <dmitry.osipenko@collabora.com> To: Akihiko Odaki <akihiko.odaki@daynix.com>, Huang Rui <ray.huang@amd.com>, =?utf-8?q?Marc-Andr=C3=A9_Lureau?= <marcandre.lureau@redhat.com>, =?utf-8?q?Philippe_Mathieu-Daud=C3=A9?= <philmd@linaro.org>, Gerd Hoffmann <kraxel@redhat.com>, =?utf-8?q?Alex_Benn=C3=A9e?= <alex.bennee@linaro.org>, "Michael S . Tsirkin" <mst@redhat.com>, Paolo Bonzini <pbonzini@redhat.com> Cc: Gert Wollny <gert.wollny@collabora.com>, qemu-devel@nongnu.org, Gurchetan Singh <gurchetansingh@chromium.org>, Alyssa Ross <hi@alyssa.is>, =?utf-8?q?Roger_Pau_Monn=C3=A9?= <roger.pau@citrix.com>, Alex Deucher <alexander.deucher@amd.com>, Stefano Stabellini <stefano.stabellini@amd.com>, =?utf-8?q?Christian_K?= =?utf-8?q?=C3=B6nig?= <christian.koenig@amd.com>, Xenia Ragiadakou <xenia.ragiadakou@amd.com>, Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>, Honglei Huang <honglei1.huang@amd.com>, Julia Zhang <julia.zhang@amd.com>, Chen Jiqian <Jiqian.Chen@amd.com>, Rob Clark <robdclark@gmail.com>, Yiwei Zhang <zzyiwei@chromium.org>, Sergio Lopez Pascual <slp@redhat.com> Subject: [PATCH v6 00/10] Support virtio-gpu DRM native context Date: Sun, 26 Jan 2025 23:11:11 +0300 Message-ID: <20250126201121.470990-1-dmitry.osipenko@collabora.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Received-SPF: pass client-ip=136.143.188.112; envelope-from=dmitry.osipenko@collabora.com; helo=sender4-pp-f112.zoho.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.01, RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, RCVD_IN_VALIDITY_SAFE_BLOCKED=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action Precedence: list Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org
Series	Support virtio-gpu DRM native context \| expand [v6,00/10] Support virtio-gpu DRM native context [v6,01/10] ui/sdl2: Restore original context after new context creation [v6,02/10] ui/sdl2: Implement dpy dmabuf functions [v6,03/10] virtio-gpu: Handle virgl fence creation errors [v6,04/10] virtio-gpu: Support asynchronous fencing [v6,05/10] virtio-gpu: Support DRM native context [v6,06/10] ui/sdl2: Don't disable scanout when display is refreshed [v6,07/10] ui/gtk: Don't disable scanout when display is refreshed [v6,08/10] docs/system: virtio-gpu: Add link to Mesa VirGL doc [v6,09/10] docs/system: virtio-gpu: Update Venus link [v6,10/10] docs/system: virtio-gpu: Document host/guest requirements

Dmitry Osipenko Jan. 26, 2025, 8:11 p.m. UTC

This patchset adds DRM native context support to VirtIO-GPU on Qemu.

Contarary to Virgl and Venus contexts that mediates high level GFX APIs,
DRM native context [1] mediates lower level kernel driver UAPI, which
reflects in a less CPU overhead and less/simpler code needed to support it.
DRM context consists of a host and guest parts that have to be implemented
for each GPU driver. On a guest side, DRM context presents a virtual GPU as
a real/native host GPU device for GL/VK applications.

[1] https://www.youtube.com/watch?v=9sFP_yddLLQ

Today there are four DRM native context drivers existing in a wild:

  - Freedreno (Qualcomm SoC GPUs), completely upstreamed
  - AMDGPU, completely upstreamed
  - Intel (i915), merge requests are opened
  - Asahi (Apple SoC GPUs), partially merged upstream

# How to try out DRM context:

1. DRM context uses host blobs and on host requires latest 6.13 version
of Linux kernel that contains necessary KVM fixes.

2. Use latest Mesa (both guest and host) and libvirglrenderer versions.
 Use build flags documented in tha patch #10 of this series.

3. On guest, use latest Linux kernel v6.14-rc or newer.

Example Qemu cmdline that enables DRM context:

  qemu-system-x86_64 -device virtio-vga-gl,hostmem=4G,blob=on,drm_native_context=on \
      -machine q35,accel=kvm,memory-backend=mem1 \
      -object memory-backend-memfd,id=mem1,size=8G -m 8G

# Note about known performance problem in Qemu:

DRM contexts are mapping host blobs extensively and these mapping
operations work slowly in Qemu. Exact reason is unknown. Mappings work
fast on Crosvm For DRM contexts this problem is more visible than for
Venus/Virgl.

Changelog:

v6: - Fixed compilation warning using older version of virglrenderer,
      which wasn't fixed properly in v5.

    - Added t-bs from Alex Bennée.

    - Added patches to improve virgl/venus doc by adding links
      to the Mesa doc as was suggested by Akihiko Odaki.

    - Updated patch that documents guest/host requirements. Added
      links to Asahi nctx and reworked the doc structure by adding
      requirements to each context-type section instead of having
      one big blob or requirements, which was objected by Akihiko Odaki.

v5: - Added r-bs from Akihiko Odaki.

    - Added acks from Michael Tsirkin.

    - Fixed compilation warning using older version of virglrenderer that
      was reported by Alex Bennée. Noticed that I need to keep old
      virgl_write_fence() code around for the older virglrenderer in
      "Support  asynchronous fencing" patch, so added it back and verified
      that old virglrenderer works properly.

    - Added new patch from Alex Bennée that adds more virtio-gpu 
      documentation with a couple corrections and additions to it from me.

    - Rebased patches on top of latest staging tree.

v4: - Improved SDL2/dmabuf patch by reusing existing Meson X11 config 
      option, better handling EGL error and extending comment telling
      that it's safe to enable SDL2 EGL preference hint. As was suggested
      by Akihiko Odaki.

    - Replaced another QSLIST_FOREACH_SAFE with QSLIST_EMPTY+FIRST in
      the async-fencing patch for more consistency of the code. As was
      suggested by Akihiko Odaki.

    - Added missing braces around if-statement that was spotted by
      Alex Bennée.

    - Renamed 'drm=on' option of virtio-gpu-gl device to 
      'drm_native_context=on' for more clarity as was suggested by 
      Alex Bennée. Haven't added added new context-type option that 
      was also proposed by Alex, might do it with a separate patch.
      This context-type option will duplicate and depecate existing
      options, but in a longer run likely will be worthwhile adding
      it.

    - Dropped Linux headers-update patch as headers has been updated
      in the staging tree.

v3: - Improved EGL presence-check code on X11 systems for the SDL2
      hint that prefers EGL over GLX by using better ifdefs and checking
      Xlib presence at a build time to avoid build failure if lib SDL2
      and system are configured with a disabled X11 support. Also added
      clarifying comment telling that X11 hint doesn't affect Wayland
      systems. Suggested by Akihiko Odaki.

    - Corrected strerror(err) that used negative error where it should
      be positive and vice versa that was caught by Akihiko Odaki. Added
      clarifying comment for the case where we get positive error code
      from virglrenderer that differs from other virglrenderer API functions.

    - Improved QSLIST usage by dropping mutex protecting the async fence
      list and using atomic variant of QSLIST helpers instead. Switched away
      from using FOREACH helper to improve readability of the code, showing
      that we don't precess list in unoptimal way. Like was suggested by
      Akihiko Odaki.

    - Updated patchset base to Venus v18.

v2: - Updated SDL2-dmabuf patch by making use of error_report() and
      checking presense of X11+EGL in the system before making SDL2
      to prefer EGL backend over GLX, suggested by Akihiko Odaki.

    - Improved SDL2's dmabuf-presence check that wasn't done properly
      in v1, where EGL was set up only after first console was fully
      inited, and thus, SDL's display .has_dmabuf callback didn't work
      for the first console. Now dmabuf support status is pre-checked
      before console is registered.

    - Updated commit description of the patch that fixes SDL2's context
      switching logic with a more detailed explanation of the problem.
      Suggested by Akihiko Odaki.

    - Corrected rebase typo in the async-fencing patch and switched
      async-fencing to use a sigle-linked list instead of the double,
      as was suggested by Akihiko Odaki.

    - Replaced "=true" with "=on" in the DRM native context documentation
      example and made virtio_gpu_virgl_init() to fail with a error message
      if DRM context can't be initialized instead of giving a warning
      message, as was suggested by Akihiko Odaki.

    - Added patchew's dependecy tag to the cover letter as was suggested by
      Akihiko Odaki.

Alex Bennée (1):
  docs/system: virtio-gpu: Document host/guest requirements

Dmitry Osipenko (8):
  ui/sdl2: Restore original context after new context creation
  virtio-gpu: Handle virgl fence creation errors
  virtio-gpu: Support asynchronous fencing
  virtio-gpu: Support DRM native context
  ui/sdl2: Don't disable scanout when display is refreshed
  ui/gtk: Don't disable scanout when display is refreshed
  docs/system: virtio-gpu: Add link to Mesa VirGL doc
  docs/system: virtio-gpu: Update Venus link

Pierre-Eric Pelloux-Prayer (1):
  ui/sdl2: Implement dpy dmabuf functions

 docs/system/devices/virtio-gpu.rst | 123 +++++++++++++++++++++-
 hw/display/virtio-gpu-gl.c         |   5 +
 hw/display/virtio-gpu-virgl.c      | 164 ++++++++++++++++++++++++++++-
 hw/display/virtio-gpu.c            |  15 +++
 include/hw/virtio/virtio-gpu.h     |  16 +++
 include/ui/sdl2.h                  |   7 ++
 meson.build                        |   6 +-
 ui/gtk-egl.c                       |   1 -
 ui/gtk-gl-area.c                   |   1 -
 ui/sdl2-gl.c                       |  68 +++++++++++-
 ui/sdl2.c                          |  42 ++++++++
 11 files changed, 437 insertions(+), 11 deletions(-)

Alex Bennée Jan. 27, 2025, 4:17 p.m. UTC | #1

Dmitry Osipenko <dmitry.osipenko@collabora.com> writes:

> This patchset adds DRM native context support to VirtIO-GPU on Qemu.
>
> Contarary to Virgl and Venus contexts that mediates high level GFX APIs,
> DRM native context [1] mediates lower level kernel driver UAPI, which
> reflects in a less CPU overhead and less/simpler code needed to support it.
> DRM context consists of a host and guest parts that have to be implemented
> for each GPU driver. On a guest side, DRM context presents a virtual GPU as
> a real/native host GPU device for GL/VK applications.
>
> [1] https://www.youtube.com/watch?v=9sFP_yddLLQ
>
> Today there are four DRM native context drivers existing in a wild:
>
>   - Freedreno (Qualcomm SoC GPUs), completely upstreamed
>   - AMDGPU, completely upstreamed

Well good news and bad news.

I can verify that AMD native context works when I run my Aarch64 guest
on my Aarch64 host with -accel TCG (therefor avoiding KVM all together).
I get potato frame rates though (~150FPS) although I suspect that is
because the PCI errata workaround.

When it comes to graphics memory allocation is there anything I can do
to force all allocations to be very aligned? Is this in the purview of
the AMD drm drivers or TTM itself?

I'm still seeing corruption with -display gtk,gl=on on my x86 system
BTW. I would like to understand if that is a problem with QEMU, GTK or
something else in the stack before we merge.

>   - Intel (i915), merge requests are opened
>   - Asahi (Apple SoC GPUs), partially merged upstream
>
<snip>

Dmitry Osipenko Jan. 31, 2025, 5:13 p.m. UTC | #2

On 1/27/25 19:17, Alex Bennée wrote:
> Dmitry Osipenko <dmitry.osipenko@collabora.com> writes:
> 
>> This patchset adds DRM native context support to VirtIO-GPU on Qemu.
>>
>> Contarary to Virgl and Venus contexts that mediates high level GFX APIs,
>> DRM native context [1] mediates lower level kernel driver UAPI, which
>> reflects in a less CPU overhead and less/simpler code needed to support it.
>> DRM context consists of a host and guest parts that have to be implemented
>> for each GPU driver. On a guest side, DRM context presents a virtual GPU as
>> a real/native host GPU device for GL/VK applications.
>>
>> [1] https://www.youtube.com/watch?v=9sFP_yddLLQ
>>
>> Today there are four DRM native context drivers existing in a wild:
>>
>>   - Freedreno (Qualcomm SoC GPUs), completely upstreamed
>>   - AMDGPU, completely upstreamed
> 
> Well good news and bad news.
> 
> I can verify that AMD native context works when I run my Aarch64 guest
> on my Aarch64 host with -accel TCG (therefor avoiding KVM all together).
> I get potato frame rates though (~150FPS) although I suspect that is
> because the PCI errata workaround.
> 
> When it comes to graphics memory allocation is there anything I can do
> to force all allocations to be very aligned? Is this in the purview of
> the AMD drm drivers or TTM itself?

All GPU allocations should be aligned to a page size. Alignment is
specified by AMD driver. I don't expect that alignment is the problem.
What's the size of your host and guest pages?

Dmitry Osipenko Feb. 2, 2025, 3:17 p.m. UTC | #3

On 1/27/25 19:17, Alex Bennée wrote:
...
> I'm still seeing corruption with -display gtk,gl=on on my x86 system
> BTW. I would like to understand if that is a problem with QEMU, GTK or
> something else in the stack before we merge.

I reproduced the display mirroring/corruption issue and bisected it to
the following commit. The problem only happens when QEMU/GTK uses
Wayland display directly, while previously I was running QEMU with
XWayland that doesn't have the problem. Why this change breaks dmabuf
displaying with Wayland/GTK is unclear. Reverting commit fixes the bug.

+Dongwon Kim +Vivek Kasireddy

commit 77bf310084dad38b3a2badf01766c659056f1cf2
Author: Dongwon Kim <dongwon.kim@intel.com>
Date:   Fri Apr 26 15:50:59 2024 -0700

    ui/gtk: Draw guest frame at refresh cycle

    Draw routine needs to be manually invoked in the next refresh
    if there is a scanout blob from the guest. This is to prevent
    a situation where there is a scheduled draw event but it won't
    happen bacause the window is currently in inactive state
    (minimized or tabified). If draw is not done for a long time,
    gl_block timeout and/or fence timeout (on the guest) will happen
    eventually.

    v2: Use gd_gl_area_draw(vc) in gtk-gl-area.c

    Suggested-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
    Cc: Gerd Hoffmann <kraxel@redhat.com>
    Cc: Marc-André Lureau <marcandre.lureau@redhat.com>
    Cc: Daniel P. Berrangé <berrange@redhat.com>
    Signed-off-by: Dongwon Kim <dongwon.kim@intel.com>
    Acked-by: Marc-André Lureau <marcandre.lureau@redhat.com>
    Message-Id: <20240426225059.3871283-1-dongwon.kim@intel.com>

Alex Bennée Feb. 3, 2025, 9:07 a.m. UTC | #4

Dmitry Osipenko <dmitry.osipenko@collabora.com> writes:

> On 1/27/25 19:17, Alex Bennée wrote:
> ...
>> I'm still seeing corruption with -display gtk,gl=on on my x86 system
>> BTW. I would like to understand if that is a problem with QEMU, GTK or
>> something else in the stack before we merge.
>
> I reproduced the display mirroring/corruption issue and bisected it to
> the following commit. The problem only happens when QEMU/GTK uses
> Wayland display directly, while previously I was running QEMU with
> XWayland that doesn't have the problem. Why this change breaks dmabuf
> displaying with Wayland/GTK is unclear.

Ahh that makes sense - I obviously forgot to mention I'm running
sway/wayland across both machines.

> Reverting commit fixes the bug.
>
> +Dongwon Kim +Vivek Kasireddy
>
> commit 77bf310084dad38b3a2badf01766c659056f1cf2
> Author: Dongwon Kim <dongwon.kim@intel.com>
> Date:   Fri Apr 26 15:50:59 2024 -0700
>
>     ui/gtk: Draw guest frame at refresh cycle
>
>     Draw routine needs to be manually invoked in the next refresh
>     if there is a scanout blob from the guest. This is to prevent
>     a situation where there is a scheduled draw event but it won't
>     happen bacause the window is currently in inactive state
>     (minimized or tabified). If draw is not done for a long time,
>     gl_block timeout and/or fence timeout (on the guest) will happen
>     eventually.
>
>     v2: Use gd_gl_area_draw(vc) in gtk-gl-area.c
>
>     Suggested-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
>     Cc: Gerd Hoffmann <kraxel@redhat.com>
>     Cc: Marc-André Lureau <marcandre.lureau@redhat.com>
>     Cc: Daniel P. Berrangé <berrange@redhat.com>
>     Signed-off-by: Dongwon Kim <dongwon.kim@intel.com>
>     Acked-by: Marc-André Lureau <marcandre.lureau@redhat.com>
>     Message-Id: <20240426225059.3871283-1-dongwon.kim@intel.com>


Maybe a race on:

    QemuDmaBuf *dmabuf = vc->gfx.guest_fb.dmabuf;
?

Kim, Dongwon Feb. 3, 2025, 6:15 p.m. UTC | #5

Hi,

The commit below could change the timing of drawing by making the drawing
done at refresh cycle instead of via drawing event. So it looks like either dmabuf
or client's framebuffer is being written and read at the same time. Hey, can you
describe how the corruption looks like? Is it just garbage image with random noise
or the actual frame with some defects like tearing...?  

> Subject: Re: [PATCH v6 00/10] Support virtio-gpu DRM native context
> 
> Dmitry Osipenko <dmitry.osipenko@collabora.com> writes:
> 
> > On 1/27/25 19:17, Alex Bennée wrote:
> > ...
> >> I'm still seeing corruption with -display gtk,gl=on on my x86 system
> >> BTW. I would like to understand if that is a problem with QEMU, GTK
> >> or something else in the stack before we merge.
> >
> > I reproduced the display mirroring/corruption issue and bisected it to
> > the following commit. The problem only happens when QEMU/GTK uses
> > Wayland display directly, while previously I was running QEMU with
> > XWayland that doesn't have the problem. Why this change breaks dmabuf
> > displaying with Wayland/GTK is unclear.
> 
> Ahh that makes sense - I obviously forgot to mention I'm running sway/wayland
> across both machines.
> 
> > Reverting commit fixes the bug.
> >
> > +Dongwon Kim +Vivek Kasireddy
> >
> > commit 77bf310084dad38b3a2badf01766c659056f1cf2
> > Author: Dongwon Kim <dongwon.kim@intel.com>
> > Date:   Fri Apr 26 15:50:59 2024 -0700
> >
> >     ui/gtk: Draw guest frame at refresh cycle
> >
> >     Draw routine needs to be manually invoked in the next refresh
> >     if there is a scanout blob from the guest. This is to prevent
> >     a situation where there is a scheduled draw event but it won't
> >     happen bacause the window is currently in inactive state
> >     (minimized or tabified). If draw is not done for a long time,
> >     gl_block timeout and/or fence timeout (on the guest) will happen
> >     eventually.
> >
> >     v2: Use gd_gl_area_draw(vc) in gtk-gl-area.c
> >
> >     Suggested-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
> >     Cc: Gerd Hoffmann <kraxel@redhat.com>
> >     Cc: Marc-André Lureau <marcandre.lureau@redhat.com>
> >     Cc: Daniel P. Berrangé <berrange@redhat.com>
> >     Signed-off-by: Dongwon Kim <dongwon.kim@intel.com>
> >     Acked-by: Marc-André Lureau <marcandre.lureau@redhat.com>
> >     Message-Id: <20240426225059.3871283-1-dongwon.kim@intel.com>
> 
> 
> Maybe a race on:
> 
>     QemuDmaBuf *dmabuf = vc->gfx.guest_fb.dmabuf; ?
> 
> --
> Alex Bennée
> Virtualisation Tech Lead @ Linaro

Alex Bennée Feb. 3, 2025, 7:31 p.m. UTC | #6

"Kim, Dongwon" <dongwon.kim@intel.com> writes:

> Hi,
>
> The commit below could change the timing of drawing by making the drawing
> done at refresh cycle instead of via drawing event. So it looks like either dmabuf
> or client's framebuffer is being written and read at the same time. Hey, can you
> describe how the corruption looks like? Is it just garbage image with random noise
> or the actual frame with some defects like tearing...?

The terminal gets mirrored upside down and the mouse creates damage as
it moves about.

>
>> Subject: Re: [PATCH v6 00/10] Support virtio-gpu DRM native context
>> 
>> Dmitry Osipenko <dmitry.osipenko@collabora.com> writes:
>> 
>> > On 1/27/25 19:17, Alex Bennée wrote:
>> > ...
>> >> I'm still seeing corruption with -display gtk,gl=on on my x86 system
>> >> BTW. I would like to understand if that is a problem with QEMU, GTK
>> >> or something else in the stack before we merge.
>> >
>> > I reproduced the display mirroring/corruption issue and bisected it to
>> > the following commit. The problem only happens when QEMU/GTK uses
>> > Wayland display directly, while previously I was running QEMU with
>> > XWayland that doesn't have the problem. Why this change breaks dmabuf
>> > displaying with Wayland/GTK is unclear.
>> 
>> Ahh that makes sense - I obviously forgot to mention I'm running sway/wayland
>> across both machines.
>> 
>> > Reverting commit fixes the bug.
>> >
>> > +Dongwon Kim +Vivek Kasireddy
>> >
>> > commit 77bf310084dad38b3a2badf01766c659056f1cf2
>> > Author: Dongwon Kim <dongwon.kim@intel.com>
>> > Date:   Fri Apr 26 15:50:59 2024 -0700
>> >
>> >     ui/gtk: Draw guest frame at refresh cycle
>> >
>> >     Draw routine needs to be manually invoked in the next refresh
>> >     if there is a scanout blob from the guest. This is to prevent
>> >     a situation where there is a scheduled draw event but it won't
>> >     happen bacause the window is currently in inactive state
>> >     (minimized or tabified). If draw is not done for a long time,
>> >     gl_block timeout and/or fence timeout (on the guest) will happen
>> >     eventually.
>> >
>> >     v2: Use gd_gl_area_draw(vc) in gtk-gl-area.c
>> >
>> >     Suggested-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
>> >     Cc: Gerd Hoffmann <kraxel@redhat.com>
>> >     Cc: Marc-André Lureau <marcandre.lureau@redhat.com>
>> >     Cc: Daniel P. Berrangé <berrange@redhat.com>
>> >     Signed-off-by: Dongwon Kim <dongwon.kim@intel.com>
>> >     Acked-by: Marc-André Lureau <marcandre.lureau@redhat.com>
>> >     Message-Id: <20240426225059.3871283-1-dongwon.kim@intel.com>
>> 
>> 
>> Maybe a race on:
>> 
>>     QemuDmaBuf *dmabuf = vc->gfx.guest_fb.dmabuf; ?
>> 
>> --
>> Alex Bennée
>> Virtualisation Tech Lead @ Linaro

Kim, Dongwon Feb. 4, 2025, 1:51 a.m. UTC | #7

> Subject: Re: [PATCH v6 00/10] Support virtio-gpu DRM native context
> 
> "Kim, Dongwon" <dongwon.kim@intel.com> writes:
> 
> > Hi,
> >
> > The commit below could change the timing of drawing by making the
> > drawing done at refresh cycle instead of via drawing event. So it
> > looks like either dmabuf or client's framebuffer is being written and
> > read at the same time. Hey, can you describe how the corruption looks
> > like? Is it just garbage image with random noise or the actual frame with some
> defects like tearing...?
> 
> The terminal gets mirrored upside down and the mouse creates damage as it
> moves about.

I am wondering if this is reproducible without virgl and drm native context (like w/
sw rasterizer on the guest) as well. 

> 
> >
> >> Subject: Re: [PATCH v6 00/10] Support virtio-gpu DRM native context
> >>
> >> Dmitry Osipenko <dmitry.osipenko@collabora.com> writes:
> >>
> >> > On 1/27/25 19:17, Alex Bennée wrote:
> >> > ...
> >> >> I'm still seeing corruption with -display gtk,gl=on on my x86
> >> >> system BTW. I would like to understand if that is a problem with
> >> >> QEMU, GTK or something else in the stack before we merge.
> >> >
> >> > I reproduced the display mirroring/corruption issue and bisected it
> >> > to the following commit. The problem only happens when QEMU/GTK
> >> > uses Wayland display directly, while previously I was running QEMU
> >> > with XWayland that doesn't have the problem. Why this change breaks
> >> > dmabuf displaying with Wayland/GTK is unclear.
> >>
> >> Ahh that makes sense - I obviously forgot to mention I'm running
> >> sway/wayland across both machines.
> >>
> >> > Reverting commit fixes the bug.
> >> >
> >> > +Dongwon Kim +Vivek Kasireddy
> >> >
> >> > commit 77bf310084dad38b3a2badf01766c659056f1cf2
> >> > Author: Dongwon Kim <dongwon.kim@intel.com>
> >> > Date:   Fri Apr 26 15:50:59 2024 -0700
> >> >
> >> >     ui/gtk: Draw guest frame at refresh cycle
> >> >
> >> >     Draw routine needs to be manually invoked in the next refresh
> >> >     if there is a scanout blob from the guest. This is to prevent
> >> >     a situation where there is a scheduled draw event but it won't
> >> >     happen bacause the window is currently in inactive state
> >> >     (minimized or tabified). If draw is not done for a long time,
> >> >     gl_block timeout and/or fence timeout (on the guest) will happen
> >> >     eventually.
> >> >
> >> >     v2: Use gd_gl_area_draw(vc) in gtk-gl-area.c
> >> >
> >> >     Suggested-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
> >> >     Cc: Gerd Hoffmann <kraxel@redhat.com>
> >> >     Cc: Marc-André Lureau <marcandre.lureau@redhat.com>
> >> >     Cc: Daniel P. Berrangé <berrange@redhat.com>
> >> >     Signed-off-by: Dongwon Kim <dongwon.kim@intel.com>
> >> >     Acked-by: Marc-André Lureau <marcandre.lureau@redhat.com>
> >> >     Message-Id: <20240426225059.3871283-1-dongwon.kim@intel.com>
> >>
> >>
> >> Maybe a race on:
> >>
> >>     QemuDmaBuf *dmabuf = vc->gfx.guest_fb.dmabuf; ?
> >>
> >> --
> >> Alex Bennée
> >> Virtualisation Tech Lead @ Linaro
> 
> --
> Alex Bennée
> Virtualisation Tech Lead @ Linaro

Dmitry Osipenko Feb. 5, 2025, 6:37 p.m. UTC | #8

On 2/4/25 04:51, Kim, Dongwon wrote:
>> Subject: Re: [PATCH v6 00/10] Support virtio-gpu DRM native context
>>
>> "Kim, Dongwon" <dongwon.kim@intel.com> writes:
>>
>>> Hi,
>>>
>>> The commit below could change the timing of drawing by making the
>>> drawing done at refresh cycle instead of via drawing event. So it
>>> looks like either dmabuf or client's framebuffer is being written and
>>> read at the same time. Hey, can you describe how the corruption looks
>>> like? Is it just garbage image with random noise or the actual frame with some
>> defects like tearing...?
>>
>> The terminal gets mirrored upside down and the mouse creates damage as it
>> moves about.
> 
> I am wondering if this is reproducible without virgl and drm native context (like w/
> sw rasterizer on the guest) as well. 

It looks like a problem with redraw areas, see video sample [1].

[1]
https://drive.google.com/file/d/13PN2sFoPsM2ox6_gf9GLXStbsZ27xlLy/view?usp=sharing

It's reproducible without native context with a SW render using '-device
virtio-vga,hostmem=8G,blob=on'

Kim, Dongwon Feb. 5, 2025, 6:53 p.m. UTC | #9

Thanks for showing me the video. I will take a look and check what would go wrong here. I kinda understand corruption may happen
in some scenario but I don't know what could cause the upside down image. Do you have any idea?? Maybe the frame was temporarily replaced with
a mishandled texture that QEMU creates from the surface temporarily but I am not sure..

In the worst case scenario, I think we may have to revert the change from gtk-gl-area.c for now until the problem is root-caused.
But you would see some rendering timeout in case the GTK window is in invisible state (like minimization).

> Subject: Re: [PATCH v6 00/10] Support virtio-gpu DRM native context
> 
> On 2/4/25 04:51, Kim, Dongwon wrote:
> >> Subject: Re: [PATCH v6 00/10] Support virtio-gpu DRM native context
> >>
> >> "Kim, Dongwon" <dongwon.kim@intel.com> writes:
> >>
> >>> Hi,
> >>>
> >>> The commit below could change the timing of drawing by making the
> >>> drawing done at refresh cycle instead of via drawing event. So it
> >>> looks like either dmabuf or client's framebuffer is being written
> >>> and read at the same time. Hey, can you describe how the corruption
> >>> looks like? Is it just garbage image with random noise or the actual
> >>> frame with some
> >> defects like tearing...?
> >>
> >> The terminal gets mirrored upside down and the mouse creates damage
> >> as it moves about.
> >
> > I am wondering if this is reproducible without virgl and drm native
> > context (like w/ sw rasterizer on the guest) as well.
> 
> It looks like a problem with redraw areas, see video sample [1].
> 
> [1]
> https://drive.google.com/file/d/13PN2sFoPsM2ox6_gf9GLXStbsZ27xlLy/view?u
> sp=sharing
> 
> It's reproducible without native context with a SW render using '-device virtio-
> vga,hostmem=8G,blob=on'
> 
> --
> Best regards,
> Dmitry

Dmitry Osipenko Feb. 5, 2025, 8:08 p.m. UTC | #10

On 2/5/25 21:53, Kim, Dongwon wrote:
> Thanks for showing me the video. I will take a look and check what would go wrong here. I kinda understand corruption may happen
> in some scenario but I don't know what could cause the upside down image. Do you have any idea?? Maybe the frame was temporarily replaced with
> a mishandled texture that QEMU creates from the surface temporarily but I am not sure..

No clue. Could be anything. Could be a GTK/Wayland bug, could be an
obscure QEMU bug. GTK expert wanted here.

> In the worst case scenario, I think we may have to revert the change from gtk-gl-area.c for now until the problem is root-caused.

Good to me

> But you would see some rendering timeout in case the GTK window is in invisible state (like minimization).

No timeouts observed

Dmitry Osipenko Feb. 5, 2025, 10:13 p.m. UTC | #11

On 2/5/25 23:08, Dmitry Osipenko wrote:
>> Thanks for showing me the video. I will take a look and check what would go wrong here. I kinda understand corruption may happen
>> in some scenario but I don't know what could cause the upside down image. Do you have any idea?? Maybe the frame was temporarily replaced with
>> a mishandled texture that QEMU creates from the surface temporarily but I am not sure..
> No clue. Could be anything. Could be a GTK/Wayland bug, could be an
> obscure QEMU bug. GTK expert wanted here.

Alright, it's bugged with "blob=on", but works with "blob=off". While I
don't see QEMU using blobs. Might be QEMU's bug then.

Dmitry Osipenko Feb. 12, 2025, 2:46 a.m. UTC | #12

On 2/6/25 01:13, Dmitry Osipenko wrote:
> On 2/5/25 23:08, Dmitry Osipenko wrote:
>>> Thanks for showing me the video. I will take a look and check what would go wrong here. I kinda understand corruption may happen
>>> in some scenario but I don't know what could cause the upside down image. Do you have any idea?? Maybe the frame was temporarily replaced with
>>> a mishandled texture that QEMU creates from the surface temporarily but I am not sure..
>> No clue. Could be anything. Could be a GTK/Wayland bug, could be an
>> obscure QEMU bug. GTK expert wanted here.
> 
> Alright, it's bugged with "blob=on", but works with "blob=off". While I
> don't see QEMU using blobs. Might be QEMU's bug then.
> 

Looked further at it. So QEMU was using blobs with "blob=on" and I was
looking at a wrong place. Then I found that setting y0_top=true for
dmabuf makes display to show upside down, but there is no rendering bug
with it. Something redraws display with y0_top=true, while it should be
y0_top=false. I couldn't figure out how it's related to the offending
change.

I also noticed that QEMU checks Wayland presence in
early_gtk_display_init() and doesn't use EGL callbacks that are used for
X11 display, but the y0_top logic looks the same. Interestingly, Windows
should be using same display code paths as Wayland, but I don't have
ability to test Windows.

If nobody will be able to fix the bug soon, at least reverting the
Wayland part will be good to do.

Vivek Kasireddy Feb. 14, 2025, 6:03 a.m. UTC | #13

Hi Dmitry,

> Subject: Re: [PATCH v6 00/10] Support virtio-gpu DRM native context
> 
> On 2/6/25 01:13, Dmitry Osipenko wrote:
> > On 2/5/25 23:08, Dmitry Osipenko wrote:
> >>> Thanks for showing me the video. I will take a look and check what would
> go wrong here. I kinda understand corruption may happen
> >>> in some scenario but I don't know what could cause the upside down
> image. Do you have any idea?? Maybe the frame was temporarily replaced
> with
> >>> a mishandled texture that QEMU creates from the surface temporarily but
> I am not sure..
> >> No clue. Could be anything. Could be a GTK/Wayland bug, could be an
> >> obscure QEMU bug. GTK expert wanted here.
> >
> > Alright, it's bugged with "blob=on", but works with "blob=off". While I
> > don't see QEMU using blobs. Might be QEMU's bug then.
> >
> 
> Looked further at it. So QEMU was using blobs with "blob=on" and I was
> looking at a wrong place. Then I found that setting y0_top=true for
> dmabuf makes display to show upside down, but there is no rendering bug
> with it. Something redraws display with y0_top=true, while it should be
> y0_top=false. I couldn't figure out how it's related to the offending
> change.
It is possible that the additional draw call (offending change) is exposing a bug
elsewhere. This is because having both blob=on and virgl=on was not a valid combo
until recently. IIUC, virgl directly calls gd_gl_area_scanout_texture() with y0_top=true
(sometimes) but I guess gd_gl_area_scanout_texture() might be getting called again
with yo_top=false via gd_gl_area_scanout_dmabuf() in the same repaint cycle?

> 
> I also noticed that QEMU checks Wayland presence in
> early_gtk_display_init() and doesn't use EGL callbacks that are used for
> X11 display, but the y0_top logic looks the same. Interestingly, Windows
> should be using same display code paths as Wayland, but I don't have
> ability to test Windows.
> 
> If nobody will be able to fix the bug soon, at least reverting the
> Wayland part will be good to do.
Hope we can find a proper fix soon but looks like Dongwon already sent the patch
to revert the changes in gtk-gl-area.c.

Thanks,
Vivek

> 
> --
> Best regards,
> Dmitry

Alex Bennée Feb. 14, 2025, 2:33 p.m. UTC | #14

Dmitry Osipenko <dmitry.osipenko@collabora.com> writes:

> This patchset adds DRM native context support to VirtIO-GPU on Qemu.
>
> Contarary to Virgl and Venus contexts that mediates high level GFX APIs,
> DRM native context [1] mediates lower level kernel driver UAPI, which
> reflects in a less CPU overhead and less/simpler code needed to support it.
> DRM context consists of a host and guest parts that have to be implemented
> for each GPU driver. On a guest side, DRM context presents a virtual GPU as
> a real/native host GPU device for GL/VK applications.
>
<snip>

So first the good news. I can now get this up and running (x86/kvm guest
with Intel graphics) and as far as I can tell the native context mode is
working. With Dongwon Kim's patch the mirroring/corruption I was seeing
is gone.

I can successfully run glmark2-wayland (although see bellow) but vkmark
completely fails to start reporting:

  MESA: info: virtgpu backend not enabling VIRTGPU_PARAM_CREATE_FENCE_PASSING
  MESA: info: virtgpu backend not enabling VIRTGPU_PARAM_CREATE_GUEST_HANDLE
  MESA: error: DRM_IOCTL_VIRTGPU_GET_CAPS failed with Invalid argument
  MESA: error: DRM_IOCTL_VIRTGPU_CONTEXT_INIT failed with Invalid argument, continuing without context...
  MESA: error: DRM_VIRTGPU_RESOURCE_CREATE_BLOB failed with No space left on device
  MESA: error: Failed to create virtgpu AddressSpaceStream
  MESA: error: vulkan: Failed to get host connection
  MESA: error: DRM_VIRTGPU_RESOURCE_CREATE_BLOB failed with No space left on device
  MESA: error: Failed to create virtgpu AddressSpaceStream
  MESA: error: vulkan: Failed to get host connection
  MESA: error: DRM_VIRTGPU_RESOURCE_CREATE_BLOB failed with No space left on device
  MESA: error: Failed to create virtgpu AddressSpaceStream
  MESA: error: vulkan: Failed to get host connection
  MESA: warning: ../src/gfxstream/guest/vulkan/gfxstream_vk_device.cpp:681: VK_ERROR_DEVICE_LOST
  MESA: error: DRM_VIRTGPU_RESOURCE_CREATE_BLOB failed with No space left on device
  MESA: error: Failed to create virtgpu AddressSpaceStream
  MESA: error: vulkan: Failed to get host connection
  MESA: warning: ../src/gfxstream/guest/vulkan/gfxstream_vk_device.cpp:332: VK_ERROR_DEVICE_LOST
  === Physical Device 0 ===
      Vendor ID:      0x8086
      Device ID:      0xA780
      Device Name:    Intel(R) Graphics (RPL-S)
      Driver Version: 101068899
      Device UUID:    b39e1cf39b101489e3c6039406f78d6c

I was booting with 4G of shared memory.

Later versions of vkmark (2025.01) fail due to missing the
VK_KHR_display extension required as of
https://github.com/vkmark/vkmark/commit/7c3189c6482cb84c3c0e69d6dabb9d80e0c0092a

> # Note about known performance problem in Qemu:
>
> DRM contexts are mapping host blobs extensively and these mapping
> operations work slowly in Qemu. Exact reason is unknown. Mappings work
> fast on Crosvm For DRM contexts this problem is more visible than for
> Venus/Virgl.

And how!

With drm_native I get a lot of stutter while running and barely 100FPS
(compared to ~8000 on pure venus). IMHO we need to figure out why there
is such a discrepancy before merging because currently it makes more
sense to use Venus. 

<snip>

I'll do some more testing with my AMD/Aarch64 rig next week.

Dmitry Osipenko Feb. 14, 2025, 4:03 p.m. UTC | #15

On 2/14/25 17:33, Alex Bennée wrote:
> Dmitry Osipenko <dmitry.osipenko@collabora.com> writes:
> 
>> This patchset adds DRM native context support to VirtIO-GPU on Qemu.
>>
>> Contarary to Virgl and Venus contexts that mediates high level GFX APIs,
>> DRM native context [1] mediates lower level kernel driver UAPI, which
>> reflects in a less CPU overhead and less/simpler code needed to support it.
>> DRM context consists of a host and guest parts that have to be implemented
>> for each GPU driver. On a guest side, DRM context presents a virtual GPU as
>> a real/native host GPU device for GL/VK applications.
>>
> <snip>
> 
> So first the good news. I can now get this up and running (x86/kvm guest
> with Intel graphics) and as far as I can tell the native context mode is
> working. With Dongwon Kim's patch the mirroring/corruption I was seeing
> is gone.
> 
> I can successfully run glmark2-wayland (although see bellow) but vkmark
> completely fails to start reporting:
> 
>   MESA: info: virtgpu backend not enabling VIRTGPU_PARAM_CREATE_FENCE_PASSING
>   MESA: info: virtgpu backend not enabling VIRTGPU_PARAM_CREATE_GUEST_HANDLE
>   MESA: error: DRM_IOCTL_VIRTGPU_GET_CAPS failed with Invalid argument
>   MESA: error: DRM_IOCTL_VIRTGPU_CONTEXT_INIT failed with Invalid argument, continuing without context...
>   MESA: error: DRM_VIRTGPU_RESOURCE_CREATE_BLOB failed with No space left on device
>   MESA: error: Failed to create virtgpu AddressSpaceStream
>   MESA: error: vulkan: Failed to get host connection
>   MESA: error: DRM_VIRTGPU_RESOURCE_CREATE_BLOB failed with No space left on device
>   MESA: error: Failed to create virtgpu AddressSpaceStream
>   MESA: error: vulkan: Failed to get host connection
>   MESA: error: DRM_VIRTGPU_RESOURCE_CREATE_BLOB failed with No space left on device
>   MESA: error: Failed to create virtgpu AddressSpaceStream
>   MESA: error: vulkan: Failed to get host connection
>   MESA: warning: ../src/gfxstream/guest/vulkan/gfxstream_vk_device.cpp:681: VK_ERROR_DEVICE_LOST
>   MESA: error: DRM_VIRTGPU_RESOURCE_CREATE_BLOB failed with No space left on device
>   MESA: error: Failed to create virtgpu AddressSpaceStream
>   MESA: error: vulkan: Failed to get host connection
>   MESA: warning: ../src/gfxstream/guest/vulkan/gfxstream_vk_device.cpp:332: VK_ERROR_DEVICE_LOST
>   === Physical Device 0 ===
>       Vendor ID:      0x8086
>       Device ID:      0xA780
>       Device Name:    Intel(R) Graphics (RPL-S)
>       Driver Version: 101068899
>       Device UUID:    b39e1cf39b101489e3c6039406f78d6c
> 
> I was booting with 4G of shared memory.

Thanks for the testing.

I assume all these errors are generated by the failing gfxstream. Hence,
may ignore them since you don't have enabled gfxstream.

> Later versions of vkmark (2025.01) fail due to missing the
> VK_KHR_display extension required as of
> https://github.com/vkmark/vkmark/commit/7c3189c6482cb84c3c0e69d6dabb9d80e0c0092a

This VK_KHR_display problem is only reproducible with your rootfs that
you shared with me. It could be a trouble with your build configs or a
buggy package version used by your rootfs build, more likely the former.

>> # Note about known performance problem in Qemu:
>>
>> DRM contexts are mapping host blobs extensively and these mapping
>> operations work slowly in Qemu. Exact reason is unknown. Mappings work
>> fast on Crosvm For DRM contexts this problem is more visible than for
>> Venus/Virgl.
> 
> And how!
> 
> With drm_native I get a lot of stutter while running and barely 100FPS
> (compared to ~8000 on pure venus). IMHO we need to figure out why there
> is such a discrepancy before merging because currently it makes more
> sense to use 
If you'd run with Xorg/Wayland directly without a DE, then it should
work okay. This should be a problem with unmapping performance that I'm
thinking about.

That unmapping problem is partially understood. Unmapping code works
correctly, but we'll need to optimize the flatview code to perform
unmapping immediately. Meanwhile, you may apply the QEMU hack below, it
should resolve most of the stutter, please let me know if it helps.

There is also a pending Mesa intel-virtio blob mapping optimization that
currently isn't available in my gitlab code, I'll refresh that feature
and then ask you to try it.

Could be that there is more to the unmapping perf issue in QEMU. I'm
investigating.

AMDGPU nctx is less affected by the bad unmapping performance. I expect
it will work well for you.



diff --git a/util/rcu.c b/util/rcu.c
index fa32c942e4bb..aac3522c323c 100644
--- a/util/rcu.c
+++ b/util/rcu.c
@@ -174,7 +174,7 @@ void synchronize_rcu(void)
 }


-#define RCU_CALL_MIN_SIZE        30
+#define RCU_CALL_MIN_SIZE        1

 /* Multi-producer, single-consumer queue based on urcu/static/wfqueue.h
  * from liburcu.  Note that head is only used by the consumer.
@@ -267,7 +267,7 @@ static void *call_rcu_thread(void *opaque)
          * added before synchronize_rcu() starts.
          */
         while (n == 0 || (n < RCU_CALL_MIN_SIZE && ++tries <= 5)) {
-            g_usleep(10000);
+            g_usleep(1000);
             if (n == 0) {
                 qemu_event_reset(&rcu_call_ready_event);
                 n = qatomic_read(&rcu_call_count);

Alex Bennée Feb. 17, 2025, 3:18 p.m. UTC | #16

Dmitry Osipenko <dmitry.osipenko@collabora.com> writes:

> On 1/27/25 19:17, Alex Bennée wrote:
>> Dmitry Osipenko <dmitry.osipenko@collabora.com> writes:
>> 
>>> This patchset adds DRM native context support to VirtIO-GPU on Qemu.
>>>
>>> Contarary to Virgl and Venus contexts that mediates high level GFX APIs,
>>> DRM native context [1] mediates lower level kernel driver UAPI, which
>>> reflects in a less CPU overhead and less/simpler code needed to support it.
>>> DRM context consists of a host and guest parts that have to be implemented
>>> for each GPU driver. On a guest side, DRM context presents a virtual GPU as
>>> a real/native host GPU device for GL/VK applications.
>>>
>>> [1] https://www.youtube.com/watch?v=9sFP_yddLLQ
>>>
>>> Today there are four DRM native context drivers existing in a wild:
>>>
>>>   - Freedreno (Qualcomm SoC GPUs), completely upstreamed
>>>   - AMDGPU, completely upstreamed
>> 
>> Well good news and bad news.
>> 
>> I can verify that AMD native context works when I run my Aarch64 guest
>> on my Aarch64 host with -accel TCG (therefor avoiding KVM all together).
>> I get potato frame rates though (~150FPS) although I suspect that is
>> because the PCI errata workaround.
>> 
>> When it comes to graphics memory allocation is there anything I can do
>> to force all allocations to be very aligned? Is this in the purview of
>> the AMD drm drivers or TTM itself?
>
> All GPU allocations should be aligned to a page size. Alignment is
> specified by AMD driver. I don't expect that alignment is the problem.
> What's the size of your host and guest pages?

4k AFAIK.

Alex Bennée Feb. 17, 2025, 3:22 p.m. UTC | #17

Dmitry Osipenko <dmitry.osipenko@collabora.com> writes:

> On 2/14/25 17:33, Alex Bennée wrote:
>> Dmitry Osipenko <dmitry.osipenko@collabora.com> writes:
>> 
>>> This patchset adds DRM native context support to VirtIO-GPU on Qemu.
>>>
>>> Contarary to Virgl and Venus contexts that mediates high level GFX APIs,
>>> DRM native context [1] mediates lower level kernel driver UAPI, which
>>> reflects in a less CPU overhead and less/simpler code needed to support it.
>>> DRM context consists of a host and guest parts that have to be implemented
>>> for each GPU driver. On a guest side, DRM context presents a virtual GPU as
>>> a real/native host GPU device for GL/VK applications.
>>>
>> <snip>
>> 
>> So first the good news. I can now get this up and running (x86/kvm guest
>> with Intel graphics) and as far as I can tell the native context mode is
>> working. With Dongwon Kim's patch the mirroring/corruption I was seeing
>> is gone.
>> 
>> I can successfully run glmark2-wayland (although see bellow) but vkmark
>> completely fails to start reporting:
>> 
>>   MESA: info: virtgpu backend not enabling VIRTGPU_PARAM_CREATE_FENCE_PASSING
>>   MESA: info: virtgpu backend not enabling VIRTGPU_PARAM_CREATE_GUEST_HANDLE
>>   MESA: error: DRM_IOCTL_VIRTGPU_GET_CAPS failed with Invalid argument
>>   MESA: error: DRM_IOCTL_VIRTGPU_CONTEXT_INIT failed with Invalid argument, continuing without context...
>>   MESA: error: DRM_VIRTGPU_RESOURCE_CREATE_BLOB failed with No space left on device
>>   MESA: error: Failed to create virtgpu AddressSpaceStream
>>   MESA: error: vulkan: Failed to get host connection
>>   MESA: error: DRM_VIRTGPU_RESOURCE_CREATE_BLOB failed with No space left on device
>>   MESA: error: Failed to create virtgpu AddressSpaceStream
>>   MESA: error: vulkan: Failed to get host connection
>>   MESA: error: DRM_VIRTGPU_RESOURCE_CREATE_BLOB failed with No space left on device
>>   MESA: error: Failed to create virtgpu AddressSpaceStream
>>   MESA: error: vulkan: Failed to get host connection
>>   MESA: warning: ../src/gfxstream/guest/vulkan/gfxstream_vk_device.cpp:681: VK_ERROR_DEVICE_LOST
>>   MESA: error: DRM_VIRTGPU_RESOURCE_CREATE_BLOB failed with No space left on device
>>   MESA: error: Failed to create virtgpu AddressSpaceStream
>>   MESA: error: vulkan: Failed to get host connection
>>   MESA: warning: ../src/gfxstream/guest/vulkan/gfxstream_vk_device.cpp:332: VK_ERROR_DEVICE_LOST
>>   === Physical Device 0 ===
>>       Vendor ID:      0x8086
>>       Device ID:      0xA780
>>       Device Name:    Intel(R) Graphics (RPL-S)
>>       Driver Version: 101068899
>>       Device UUID:    b39e1cf39b101489e3c6039406f78d6c
>> 
>> I was booting with 4G of shared memory.
>
> Thanks for the testing.
>
> I assume all these errors are generated by the failing gfxstream. Hence,
> may ignore them since you don't have enabled gfxstream.
>
>> Later versions of vkmark (2025.01) fail due to missing the
>> VK_KHR_display extension required as of
>> https://github.com/vkmark/vkmark/commit/7c3189c6482cb84c3c0e69d6dabb9d80e0c0092a
>
> This VK_KHR_display problem is only reproducible with your rootfs that
> you shared with me. It could be a trouble with your build configs or a
> buggy package version used by your rootfs build, more likely the
> former.

So you have built that latest vkmark? This is a recent addition to
vkmark for the 2025.1 release.

Does vulkaninfo --summary show the extension available for you? It is
certainly available on the host side:

VK_KHR_display                         : extension revision 23

>>> # Note about known performance problem in Qemu:
>>>
>>> DRM contexts are mapping host blobs extensively and these mapping
>>> operations work slowly in Qemu. Exact reason is unknown. Mappings work
>>> fast on Crosvm For DRM contexts this problem is more visible than for
>>> Venus/Virgl.
>> 
>> And how!
>> 
>> With drm_native I get a lot of stutter while running and barely 100FPS
>> (compared to ~8000 on pure venus). IMHO we need to figure out why there
>> is such a discrepancy before merging because currently it makes more
>> sense to use 
> If you'd run with Xorg/Wayland directly without a DE, then it should
> work okay. This should be a problem with unmapping performance that I'm
> thinking about.
>
> That unmapping problem is partially understood. Unmapping code works
> correctly, but we'll need to optimize the flatview code to perform
> unmapping immediately.

Why immediately? Surely if we are unmapping we can defer it. Or is this
a case of having stale mappings making the life of new allocations
harder?

> Meanwhile, you may apply the QEMU hack below, it
> should resolve most of the stutter, please let me know if it helps.
>
> There is also a pending Mesa intel-virtio blob mapping optimization that
> currently isn't available in my gitlab code, I'll refresh that feature
> and then ask you to try it.
>
> Could be that there is more to the unmapping perf issue in QEMU. I'm
> investigating.
>
> AMDGPU nctx is less affected by the bad unmapping performance. I expect
> it will work well for you.
>
>
>
> diff --git a/util/rcu.c b/util/rcu.c
> index fa32c942e4bb..aac3522c323c 100644
> --- a/util/rcu.c
> +++ b/util/rcu.c
> @@ -174,7 +174,7 @@ void synchronize_rcu(void)
>  }
>
>
> -#define RCU_CALL_MIN_SIZE        30
> +#define RCU_CALL_MIN_SIZE        1
>
>  /* Multi-producer, single-consumer queue based on urcu/static/wfqueue.h
>   * from liburcu.  Note that head is only used by the consumer.
> @@ -267,7 +267,7 @@ static void *call_rcu_thread(void *opaque)
>           * added before synchronize_rcu() starts.
>           */
>          while (n == 0 || (n < RCU_CALL_MIN_SIZE && ++tries <= 5)) {
> -            g_usleep(10000);
> +            g_usleep(1000);
>              if (n == 0) {
>                  qemu_event_reset(&rcu_call_ready_event);
>                  n = qatomic_read(&rcu_call_count);

Dmitry Osipenko Feb. 17, 2025, 6:02 p.m. UTC | #18

On 2/17/25 18:22, Alex Bennée wrote:
...
>> This VK_KHR_display problem is only reproducible with your rootfs that
>> you shared with me. It could be a trouble with your build configs or a
>> buggy package version used by your rootfs build, more likely the
>> former.
> So you have built that latest vkmark? This is a recent addition to
> vkmark for the 2025.1 release.

Yes, latest 2025.1 from git/master.

> Does vulkaninfo --summary show the extension available for you? It is
> certainly available on the host side:
> 
> VK_KHR_display                         : extension revision 23
> 

Have it on guest with my rootfs, not with yours. I'd suspect problem is
with the your Mesa build flags, maybe you haven't enabled necessary
flags related to WSI.

..
>>> With drm_native I get a lot of stutter while running and barely 100FPS
>>> (compared to ~8000 on pure venus). IMHO we need to figure out why there
>>> is such a discrepancy before merging because currently it makes more
>>> sense to use 
>> If you'd run with Xorg/Wayland directly without a DE, then it should
>> work okay. This should be a problem with unmapping performance that I'm
>> thinking about.
>>
>> That unmapping problem is partially understood. Unmapping code works
>> correctly, but we'll need to optimize the flatview code to perform
>> unmapping immediately.
> Why immediately? Surely if we are unmapping we can defer it. Or is this
> a case of having stale mappings making the life of new allocations
> harder?

Unmapping currently works synchronously for virtio-gpu in QEMU, hence
deferring it blocks whole virtio-gpu up to 100+ ms. And if multiple
unmappings are done in a row, then it's 100ms multiplied by the number
of unmappings.

Alex Bennée Feb. 17, 2025, 6:35 p.m. UTC | #19

Dmitry Osipenko <dmitry.osipenko@collabora.com> writes:

> On 2/17/25 18:22, Alex Bennée wrote:
> ...
>>> This VK_KHR_display problem is only reproducible with your rootfs that
>>> you shared with me. It could be a trouble with your build configs or a
>>> buggy package version used by your rootfs build, more likely the
>>> former.
>> So you have built that latest vkmark? This is a recent addition to
>> vkmark for the 2025.1 release.
>
> Yes, latest 2025.1 from git/master.
>
>> Does vulkaninfo --summary show the extension available for you? It is
>> certainly available on the host side:
>> 
>> VK_KHR_display                         : extension revision 23
>> 
>
> Have it on guest with my rootfs, not with yours. I'd suspect problem is
> with the your Mesa build flags, maybe you haven't enabled necessary
> flags related to WSI.

I can't see any reference in the buildroot recipes. What is your mesa's
build flags?

>
> ..
>>>> With drm_native I get a lot of stutter while running and barely 100FPS
>>>> (compared to ~8000 on pure venus). IMHO we need to figure out why there
>>>> is such a discrepancy before merging because currently it makes more
>>>> sense to use 
>>> If you'd run with Xorg/Wayland directly without a DE, then it should
>>> work okay. This should be a problem with unmapping performance that I'm
>>> thinking about.
>>>
>>> That unmapping problem is partially understood. Unmapping code works
>>> correctly, but we'll need to optimize the flatview code to perform
>>> unmapping immediately.
>> Why immediately? Surely if we are unmapping we can defer it. Or is this
>> a case of having stale mappings making the life of new allocations
>> harder?
>
> Unmapping currently works synchronously for virtio-gpu in QEMU, hence
> deferring it blocks whole virtio-gpu up to 100+ ms. And if multiple
> unmappings are done in a row, then it's 100ms multiplied by the number
> of unmappings.

[v6,00/10] Support virtio-gpu DRM native context

Message

Comments