Message ID | 20201231205136.11422-1-mario.kleiner.de@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | drm/amd/display: Fix pageflipping for XOrg in Linux 5.11+ | expand |
I think the problem here is that application A can set the FB and then application B can use getfb2 (say ffmpeg). https://lists.freedesktop.org/archives/dri-devel/2021-January/292761.html would be my alternative patch. (I'm not good at detecting the effects of tearing apparently but tested this avoids the pageflip failure by debug-prints) On Thu, Dec 31, 2020 at 9:52 PM Mario Kleiner <mario.kleiner.de@gmail.com> wrote: > > Commit 816853f9dc4057b6c7ee3c45ca9bd5905 ("drm/amd/display: Set new > format info for converted metadata.") may fix the getfb2 ioctl, but > in exchange it completely breaks all pageflipping for classic user > space, e.g., XOrg, as tested with both amdgpu-ddx and modesetting-ddx. > This leads to massive tearing, broken visual timing/timestamping etc. > > Reason is that the classic pageflip ioctl doesn't allow a fb format > change during flip, and at least X uses classic pageflip ioctl and no > atomic modesetting api at all. > > As one attempted workaround, only set the new format info for converted > metadata if the calling client isn't X. Not sure if this is the best > way, or if a better check would not be "not all atomic clients" or > similar? In any case it works for XOrg X-Server. Checking the ddx > code of intel-ddx/modesetting-ddx/amdgpu-ddx as well as grepping over > Mesa doesn't show any users of the getfb2 ioctl(), so the need for this > format info assignment seems to be more the exception than the rule? > > Fixes: 816853f9dc40 ("drm/amd/display: Set new format info for converted metadata.") > Cc: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> > Cc: Alex Deucher <alexander.deucher@amd.com> > Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com> > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_display.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c > index f764803c53a4..cb414b3d327a 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c > @@ -828,7 +828,8 @@ static int convert_tiling_flags_to_modifier(struct amdgpu_framebuffer *afb) > if (!format_info) > return -EINVAL; > > - afb->base.format = format_info; > + if (afb->base.comm[0] != 'X') > + afb->base.format = format_info; > } > } > > -- > 2.25.1 >
On Sat, Jan 2, 2021 at 3:05 PM Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> wrote: > > I think the problem here is that application A can set the FB and then > application B can use getfb2 (say ffmpeg). Yes. That, and also the check for 'X' won't get us far, because if i use my own software Psychtoolbox under Vulkan in direct display mode (leased RandR outputs), e.g., under radv or amdvlk, then the ->comm name will be "PTB mainthread" and 'P' != 'X'. But the Vulkan drivers all use legacy pageflips as well in der WSI/drm, so if Vulkan gets framebuffers with DCC modifiers, it would just fail the same way. Neither would it work to check for atomic client, as they sometimes use atomic commit ioctl only for certain use cases, e.g., setting HDR metadata, but still use the legacy pageflip ioctl for flipping. So that patch of mine is not the proper solution for anything non-X. > > https://lists.freedesktop.org/archives/dri-devel/2021-January/292761.html > would be my alternative patch. > I also produced and tested hopefully better alternative to my original one yesterday, but was too tired to send it. I just sent it out to you: https://lists.freedesktop.org/archives/dri-devel/2021-January/292763.html This one keeps the format_info check as is for non-atomic drivers, but no longer rejects pageflip if the underlying kms driver is atomic. I checked, and current atomic drivers use the drm_atomic... helper for implementing legacy pageflips, and that helper just wraps the pageflip into a "set new fb on plane" + atomic check + atomic commit. My understanding is that one can do these format changes safely under atomic commit, so i hope this would be safe and future proof. > (I'm not good at detecting the effects of tearing apparently but > tested this avoids the pageflip failure by debug-prints) XOrg log (e.g., ~/.local/share/xorg/XOrg0.log on current Ubuntu's) is a good place on native XOrg, where the amdgpu-ddx was flooding the log with present flip failures. Or drm.debug=4 for the kernel log. Piglit has the OML_sync_control tests for timing correctness, although they are mostly pointless if not run in fullscreen mode, which they are not by default. I can also highly recommend (sudo apt install octave-psychtoolbox-3) on Debian/Ubutu based systems for X11. It is used for neuroscience and medical research and critically depends on properly working pageflips and timestamping on native X11/GLX under OpenGL and recently also under Vulkan/WSI (radv,anv,amdvlk) in direct display mode. Working FOSS AMD and Intel are especially critical for this research, so far under X11+Mesa/OpenGL, but lately also under Vulkan direct display mode. It has many built-in correctness tests and will shout angrily if something software-detectable is broken wrt. pageflipping or timing. E.g., octave-cli --eval PerceptualVBLSyncTest PerceptualVBLSyncTest creates a flicker pattern that will tear like crazy under Mesa if pageflipping isn't used. Also good to test synchronization on dual-display setups, e.g., for udal-display stereo presentation. I was actually surprised that this patch made it through the various test suites and into drm-next. I thought page-flipping was covered well enough somewhere. Happy new year! -mario > > On Thu, Dec 31, 2020 at 9:52 PM Mario Kleiner > <mario.kleiner.de@gmail.com> wrote: > > > > Commit 816853f9dc4057b6c7ee3c45ca9bd5905 ("drm/amd/display: Set new > > format info for converted metadata.") may fix the getfb2 ioctl, but > > in exchange it completely breaks all pageflipping for classic user > > space, e.g., XOrg, as tested with both amdgpu-ddx and modesetting-ddx. > > This leads to massive tearing, broken visual timing/timestamping etc. > > > > Reason is that the classic pageflip ioctl doesn't allow a fb format > > change during flip, and at least X uses classic pageflip ioctl and no > > atomic modesetting api at all. > > > > As one attempted workaround, only set the new format info for converted > > metadata if the calling client isn't X. Not sure if this is the best > > way, or if a better check would not be "not all atomic clients" or > > similar? In any case it works for XOrg X-Server. Checking the ddx > > code of intel-ddx/modesetting-ddx/amdgpu-ddx as well as grepping over > > Mesa doesn't show any users of the getfb2 ioctl(), so the need for this > > format info assignment seems to be more the exception than the rule? > > > > Fixes: 816853f9dc40 ("drm/amd/display: Set new format info for converted metadata.") > > Cc: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> > > Cc: Alex Deucher <alexander.deucher@amd.com> > > Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com> > > --- > > drivers/gpu/drm/amd/amdgpu/amdgpu_display.c | 3 ++- > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c > > index f764803c53a4..cb414b3d327a 100644 > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c > > @@ -828,7 +828,8 @@ static int convert_tiling_flags_to_modifier(struct amdgpu_framebuffer *afb) > > if (!format_info) > > return -EINVAL; > > > > - afb->base.format = format_info; > > + if (afb->base.comm[0] != 'X') > > + afb->base.format = format_info; > > } > > } > > > > -- > > 2.25.1 > >
On Sat, Jan 2, 2021 at 4:05 PM Mario Kleiner <mario.kleiner.de@gmail.com> wrote: > > On Sat, Jan 2, 2021 at 3:05 PM Bas Nieuwenhuizen > <bas@basnieuwenhuizen.nl> wrote: > > > > I think the problem here is that application A can set the FB and then > > application B can use getfb2 (say ffmpeg). > > > Yes. That, and also the check for 'X' won't get us far, because if i > use my own software Psychtoolbox under Vulkan in direct display mode > (leased RandR outputs), e.g., under radv or amdvlk, then the ->comm > name will be "PTB mainthread" and 'P' != 'X'. But the Vulkan drivers > all use legacy pageflips as well in der WSI/drm, so if Vulkan gets > framebuffers with DCC modifiers, it would just fail the same way. > > Neither would it work to check for atomic client, as they sometimes > use atomic commit ioctl only for certain use cases, e.g., setting HDR > metadata, but still use the legacy pageflip ioctl for flipping. > > So that patch of mine is not the proper solution for anything non-X. > > > > > https://lists.freedesktop.org/archives/dri-devel/2021-January/292761.html > > would be my alternative patch. > > > > I also produced and tested hopefully better alternative to my original > one yesterday, but was too tired to send it. I just sent it out to > you: > https://lists.freedesktop.org/archives/dri-devel/2021-January/292763.html > > This one keeps the format_info check as is for non-atomic drivers, but > no longer rejects pageflip if the underlying kms driver is atomic. I > checked, and current atomic drivers use the drm_atomic... helper for > implementing legacy pageflips, and that helper just wraps the pageflip > into a "set new fb on plane" + atomic check + atomic commit. > > My understanding is that one can do these format changes safely under > atomic commit, so i hope this would be safe and future proof. So I think the difference between your patch and mine seem to boil down to whether we want any uabi extension, since AFAICT none of the pre-atomic drivers support modifiers. > > > (I'm not good at detecting the effects of tearing apparently but > > tested this avoids the pageflip failure by debug-prints) > > > XOrg log (e.g., ~/.local/share/xorg/XOrg0.log on current Ubuntu's) is > a good place on native XOrg, where the amdgpu-ddx was flooding the log > with present flip failures. Or drm.debug=4 for the kernel log. > > Piglit has the OML_sync_control tests for timing correctness, although > they are mostly pointless if not run in fullscreen mode, which they > are not by default. > > I can also highly recommend (sudo apt install octave-psychtoolbox-3) > on Debian/Ubutu based systems for X11. It is used for neuroscience and > medical research and critically depends on properly working pageflips > and timestamping on native X11/GLX under OpenGL and recently also > under Vulkan/WSI (radv,anv,amdvlk) in direct display mode. Working > FOSS AMD and Intel are especially critical for this research, so far > under X11+Mesa/OpenGL, but lately also under Vulkan direct display > mode. It has many built-in correctness tests and will shout angrily if > something software-detectable is broken wrt. pageflipping or timing. > E.g., octave-cli --eval PerceptualVBLSyncTest > PerceptualVBLSyncTest creates a flicker pattern that will tear like > crazy under Mesa if pageflipping isn't used. Also good to test > synchronization on dual-display setups, e.g., for udal-display stereo > presentation. > > I was actually surprised that this patch made it through the various > test suites and into drm-next. I thought page-flipping was covered > well enough somewhere. I don't think there are any tests using the AMDGPU implicit modifiers (not in IGT for anything except linear at least) > > Happy new year! > -mario > > > > > On Thu, Dec 31, 2020 at 9:52 PM Mario Kleiner > > <mario.kleiner.de@gmail.com> wrote: > > > > > > Commit 816853f9dc4057b6c7ee3c45ca9bd5905 ("drm/amd/display: Set new > > > format info for converted metadata.") may fix the getfb2 ioctl, but > > > in exchange it completely breaks all pageflipping for classic user > > > space, e.g., XOrg, as tested with both amdgpu-ddx and modesetting-ddx. > > > This leads to massive tearing, broken visual timing/timestamping etc. > > > > > > Reason is that the classic pageflip ioctl doesn't allow a fb format > > > change during flip, and at least X uses classic pageflip ioctl and no > > > atomic modesetting api at all. > > > > > > As one attempted workaround, only set the new format info for converted > > > metadata if the calling client isn't X. Not sure if this is the best > > > way, or if a better check would not be "not all atomic clients" or > > > similar? In any case it works for XOrg X-Server. Checking the ddx > > > code of intel-ddx/modesetting-ddx/amdgpu-ddx as well as grepping over > > > Mesa doesn't show any users of the getfb2 ioctl(), so the need for this > > > format info assignment seems to be more the exception than the rule? > > > > > > Fixes: 816853f9dc40 ("drm/amd/display: Set new format info for converted metadata.") > > > Cc: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> > > > Cc: Alex Deucher <alexander.deucher@amd.com> > > > Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com> > > > --- > > > drivers/gpu/drm/amd/amdgpu/amdgpu_display.c | 3 ++- > > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c > > > index f764803c53a4..cb414b3d327a 100644 > > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c > > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c > > > @@ -828,7 +828,8 @@ static int convert_tiling_flags_to_modifier(struct amdgpu_framebuffer *afb) > > > if (!format_info) > > > return -EINVAL; > > > > > > - afb->base.format = format_info; > > > + if (afb->base.comm[0] != 'X') > > > + afb->base.format = format_info; > > > } > > > } > > > > > > -- > > > 2.25.1 > > >
On Sat, Jan 2, 2021 at 4:49 PM Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> wrote: > > On Sat, Jan 2, 2021 at 4:05 PM Mario Kleiner <mario.kleiner.de@gmail.com> wrote: > > > > On Sat, Jan 2, 2021 at 3:05 PM Bas Nieuwenhuizen > > <bas@basnieuwenhuizen.nl> wrote: > > > > > > I think the problem here is that application A can set the FB and then > > > application B can use getfb2 (say ffmpeg). > > > > > > Yes. That, and also the check for 'X' won't get us far, because if i > > use my own software Psychtoolbox under Vulkan in direct display mode > > (leased RandR outputs), e.g., under radv or amdvlk, then the ->comm > > name will be "PTB mainthread" and 'P' != 'X'. But the Vulkan drivers > > all use legacy pageflips as well in der WSI/drm, so if Vulkan gets > > framebuffers with DCC modifiers, it would just fail the same way. > > > > Neither would it work to check for atomic client, as they sometimes > > use atomic commit ioctl only for certain use cases, e.g., setting HDR > > metadata, but still use the legacy pageflip ioctl for flipping. > > > > So that patch of mine is not the proper solution for anything non-X. > > > > > > > > https://lists.freedesktop.org/archives/dri-devel/2021-January/292761.html > > > would be my alternative patch. > > > > > > > I also produced and tested hopefully better alternative to my original > > one yesterday, but was too tired to send it. I just sent it out to > > you: > > https://lists.freedesktop.org/archives/dri-devel/2021-January/292763.html > > > > This one keeps the format_info check as is for non-atomic drivers, but > > no longer rejects pageflip if the underlying kms driver is atomic. I > > checked, and current atomic drivers use the drm_atomic... helper for > > implementing legacy pageflips, and that helper just wraps the pageflip > > into a "set new fb on plane" + atomic check + atomic commit. > > > > My understanding is that one can do these format changes safely under > > atomic commit, so i hope this would be safe and future proof. > > So I think the difference between your patch and mine seem to boil > down to whether we want any uabi extension, since AFAICT none of the > pre-atomic drivers support modifiers. > That's a point: Although the uabi extension would only relax rules, instead of tighten them, so current drm clients would be ok, i guess. Afaict the current non-atomic modesetting drivers are: gma500, shmobile, radeon, nouveau, amdgpu non-DC. gma500, shmobile and radeon don't use modifiers, and probably won't get any in the future? Also amdgpu without DC? Atm. you only enable explicit modifiers for >= FAMILY_AI, ie. Vega and later, and DC is a requirement for Vega and later, so modifiers ==> DC ==> atomic. But some of your code was moved from amdgpu_dm to amdgpu_display specifically to allow compiling it without DC, and any client i tested apart from Waylands weston (and that only for cursor planes) didn't use addfb2 ioctl with modifiers at all, so all of X and Vulkan currently hits the new convert_tiling_flags_to_modifier() fallback code that converts old style tiling flags into modifiers. That fallback path is the reason for triggering this issue in the first place, as it converts some tiling flags to DCC/DCC-retile modifiers and therefore changes the format_info. Modifiers are only enabled if DC is on. So as long as nobody decides to add modifiers in the legacy non-DC kms path, we'd be ok. I'm less sure about nouveau. It uses modifiers, but has atomic support only on nv50+ and that atomic support is off by default -- needs a nouveau.nouveau_atomic=1 boot parameter to switch it on. It seems to enable modifier support unconditionally regardless if atomic or not, see: https://elixir.bootlin.com/linux/v5.11-rc1/source/drivers/gpu/drm/nouveau/nouveau_display.c#L703 Atm. nouveau doesn't assign a new format_info though, so wouldn't trigger this issue atm. So i think the decision is about relaxing uabi a bit with my patch, vs. less future-proofing with your patch? Atm. both patches would solve the immediate problem, which is very serious for my users' use cases, so I'd be ok with any of them. I just don't want this issue to repeat in the future. Tracking it down killed almost two full days for me, although I involuntarily learned more about the current state of modifiers in kernel and user space than I ever thought I wanted to know :/. -mario > > > > > (I'm not good at detecting the effects of tearing apparently but > > > tested this avoids the pageflip failure by debug-prints) > > > > > > XOrg log (e.g., ~/.local/share/xorg/XOrg0.log on current Ubuntu's) is > > a good place on native XOrg, where the amdgpu-ddx was flooding the log > > with present flip failures. Or drm.debug=4 for the kernel log. > > > > Piglit has the OML_sync_control tests for timing correctness, although > > they are mostly pointless if not run in fullscreen mode, which they > > are not by default. > > > > I can also highly recommend (sudo apt install octave-psychtoolbox-3) > > on Debian/Ubutu based systems for X11. It is used for neuroscience and > > medical research and critically depends on properly working pageflips > > and timestamping on native X11/GLX under OpenGL and recently also > > under Vulkan/WSI (radv,anv,amdvlk) in direct display mode. Working > > FOSS AMD and Intel are especially critical for this research, so far > > under X11+Mesa/OpenGL, but lately also under Vulkan direct display > > mode. It has many built-in correctness tests and will shout angrily if > > something software-detectable is broken wrt. pageflipping or timing. > > E.g., octave-cli --eval PerceptualVBLSyncTest > > PerceptualVBLSyncTest creates a flicker pattern that will tear like > > crazy under Mesa if pageflipping isn't used. Also good to test > > synchronization on dual-display setups, e.g., for udal-display stereo > > presentation. > > > > I was actually surprised that this patch made it through the various > > test suites and into drm-next. I thought page-flipping was covered > > well enough somewhere. > > I don't think there are any tests using the AMDGPU implicit modifiers > (not in IGT for anything except linear at least) > > > > > Happy new year! > > -mario > > > > >
On Sat, Jan 2, 2021 at 1:35 PM Mario Kleiner <mario.kleiner.de@gmail.com> wrote: > I'm less sure about nouveau. It uses modifiers, but has atomic support > only on nv50+ and that atomic support is off by default -- needs a > nouveau.nouveau_atomic=1 boot parameter to switch it on. It seems to > enable modifier support unconditionally regardless if atomic or not, > see: > https://elixir.bootlin.com/linux/v5.11-rc1/source/drivers/gpu/drm/nouveau/nouveau_display.c#L703 > > Atm. nouveau doesn't assign a new format_info though, so wouldn't > trigger this issue atm. Note that pre-nv50, no modifiers exist. Also, drm_drv_uses_atomic_modeset() doesn't care whether the client is an atomic client or not. It will return true for nv50+ no matter what. nouveau_atomic=1 affects whether atomic UAPI is exposed. Not sure if this impacts your discussion. Cheers, -ilia
On Sat, Jan 2, 2021 at 7:51 PM Ilia Mirkin <imirkin@alum.mit.edu> wrote: > On Sat, Jan 2, 2021 at 1:35 PM Mario Kleiner <mario.kleiner.de@gmail.com> > wrote: > > I'm less sure about nouveau. It uses modifiers, but has atomic support > > only on nv50+ and that atomic support is off by default -- needs a > > nouveau.nouveau_atomic=1 boot parameter to switch it on. It seems to > > enable modifier support unconditionally regardless if atomic or not, > > see: > > > https://elixir.bootlin.com/linux/v5.11-rc1/source/drivers/gpu/drm/nouveau/nouveau_display.c#L703 > > > > Atm. nouveau doesn't assign a new format_info though, so wouldn't > > trigger this issue atm. > > Note that pre-nv50, no modifiers exist. Also, > drm_drv_uses_atomic_modeset() doesn't care whether the client is an > atomic client or not. It will return true for nv50+ no matter what. > nouveau_atomic=1 affects whether atomic UAPI is exposed. Not sure if > this impacts your discussion. > > Thanks Ilia. So nouveau is fine in any case, as nv50 => modifiers and atomic commit even if atomic UAPI is off. Also drm_drv_uses_atomic_modeset() is the right choice, as my patch should check if the atomic driver uses atomic commit, it doesn't care about atomic UAPI or the client being atomic. -mario
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c index f764803c53a4..cb414b3d327a 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c @@ -828,7 +828,8 @@ static int convert_tiling_flags_to_modifier(struct amdgpu_framebuffer *afb) if (!format_info) return -EINVAL; - afb->base.format = format_info; + if (afb->base.comm[0] != 'X') + afb->base.format = format_info; } }
Commit 816853f9dc4057b6c7ee3c45ca9bd5905 ("drm/amd/display: Set new format info for converted metadata.") may fix the getfb2 ioctl, but in exchange it completely breaks all pageflipping for classic user space, e.g., XOrg, as tested with both amdgpu-ddx and modesetting-ddx. This leads to massive tearing, broken visual timing/timestamping etc. Reason is that the classic pageflip ioctl doesn't allow a fb format change during flip, and at least X uses classic pageflip ioctl and no atomic modesetting api at all. As one attempted workaround, only set the new format info for converted metadata if the calling client isn't X. Not sure if this is the best way, or if a better check would not be "not all atomic clients" or similar? In any case it works for XOrg X-Server. Checking the ddx code of intel-ddx/modesetting-ddx/amdgpu-ddx as well as grepping over Mesa doesn't show any users of the getfb2 ioctl(), so the need for this format info assignment seems to be more the exception than the rule? Fixes: 816853f9dc40 ("drm/amd/display: Set new format info for converted metadata.") Cc: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com> --- drivers/gpu/drm/amd/amdgpu/amdgpu_display.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)