diff mbox series

[3/4] drm: introduce DRM_CAP_ATOMIC_ASYNC_PAGE_FLIP

Message ID 20220824150834.427572-4-contact@emersion.fr (mailing list archive)
State New, archived
Headers show
Series Add support for atomic async page-flips | expand

Commit Message

Simon Ser Aug. 24, 2022, 3:08 p.m. UTC
This new kernel capability indicates whether async page-flips are
supported via the atomic uAPI. DRM clients can use it to check
for support before feeding DRM_MODE_PAGE_FLIP_ASYNC to the kernel.

Make it clear that DRM_CAP_ASYNC_PAGE_FLIP is for legacy uAPI only.

Signed-off-by: Simon Ser <contact@emersion.fr>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Joshua Ashton <joshua@froggi.es>
Cc: Melissa Wen <mwen@igalia.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Harry Wentland <hwentlan@amd.com>
Cc: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
---
 drivers/gpu/drm/drm_ioctl.c |  5 +++++
 include/uapi/drm/drm.h      | 10 +++++++++-
 2 files changed, 14 insertions(+), 1 deletion(-)

Comments

Ville Syrjälä Aug. 26, 2022, 8:19 a.m. UTC | #1
On Wed, Aug 24, 2022 at 03:08:55PM +0000, Simon Ser wrote:
> This new kernel capability indicates whether async page-flips are
> supported via the atomic uAPI. DRM clients can use it to check
> for support before feeding DRM_MODE_PAGE_FLIP_ASYNC to the kernel.

I think we'd need to clarify the semantics of the async flag
for atomic commits.

Eg. on Intel hw only pure page flips are possible async, if you do
anything else (change plane size/pos/scaling/etc.) you will need
to do a sync update. Technically not even all page flips (from the
uapi POV) might be possible as the exact scanout source address
is specified via two registers, only one of which can be update
async. So technically the two framebuffers might be laid out
just slightly differently which could prevent an async flip.
Also only some subset of planes actually support async flips.

And on hw where multiple planes support it on the same crtc, only one
plane can do it at a time. Well, more accurately we can only select
one plane at a time to give us the "flip done" interrupt. I guess
if the user wants to async flip multiple planes at the same time
we could do them serially as opposed to in parallel to make sure
all the flips actually happened before we signal completion of the
entire commit. Async flips of multiple planes probably won't
happen atomically anyway so doing them serially seems fine.

ATM in i915 we probably don't have sufficient state checks in
place to catch all the restrictions, and instead in part we rely
on the limited scope of the legacy async flip ioctl to make sure
the operation doesn't attempt something the hw can't do.

> Make it clear that DRM_CAP_ASYNC_PAGE_FLIP is for legacy uAPI only.
> 
> Signed-off-by: Simon Ser <contact@emersion.fr>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: Joshua Ashton <joshua@froggi.es>
> Cc: Melissa Wen <mwen@igalia.com>
> Cc: Alex Deucher <alexander.deucher@amd.com>
> Cc: Harry Wentland <hwentlan@amd.com>
> Cc: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
> ---
>  drivers/gpu/drm/drm_ioctl.c |  5 +++++
>  include/uapi/drm/drm.h      | 10 +++++++++-
>  2 files changed, 14 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/drm_ioctl.c b/drivers/gpu/drm/drm_ioctl.c
> index ca2a6e6101dc..5b1591e2b46c 100644
> --- a/drivers/gpu/drm/drm_ioctl.c
> +++ b/drivers/gpu/drm/drm_ioctl.c
> @@ -302,6 +302,11 @@ static int drm_getcap(struct drm_device *dev, void *data, struct drm_file *file_
>  	case DRM_CAP_CRTC_IN_VBLANK_EVENT:
>  		req->value = 1;
>  		break;
> +	case DRM_CAP_ATOMIC_ASYNC_PAGE_FLIP:
> +		req->value = drm_core_check_feature(dev, DRIVER_ATOMIC) &&
> +			     dev->mode_config.async_page_flip &&
> +			     !dev->mode_config.atomic_async_page_flip_not_supported;
> +		break;
>  	default:
>  		return -EINVAL;
>  	}
> diff --git a/include/uapi/drm/drm.h b/include/uapi/drm/drm.h
> index 642808520d92..b1962628ecda 100644
> --- a/include/uapi/drm/drm.h
> +++ b/include/uapi/drm/drm.h
> @@ -706,7 +706,8 @@ struct drm_gem_open {
>  /**
>   * DRM_CAP_ASYNC_PAGE_FLIP
>   *
> - * If set to 1, the driver supports &DRM_MODE_PAGE_FLIP_ASYNC.
> + * If set to 1, the driver supports &DRM_MODE_PAGE_FLIP_ASYNC for legacy
> + * page-flips.
>   */
>  #define DRM_CAP_ASYNC_PAGE_FLIP		0x7
>  /**
> @@ -767,6 +768,13 @@ struct drm_gem_open {
>   * Documentation/gpu/drm-mm.rst, section "DRM Sync Objects".
>   */
>  #define DRM_CAP_SYNCOBJ_TIMELINE	0x14
> +/**
> + * DRM_CAP_ATOMIC_ASYNC_PAGE_FLIP
> + *
> + * If set to 1, the driver supports &DRM_MODE_PAGE_FLIP_ASYNC for atomic
> + * commits.
> + */
> +#define DRM_CAP_ATOMIC_ASYNC_PAGE_FLIP	0x15
>  
>  /* DRM_IOCTL_GET_CAP ioctl argument type */
>  struct drm_get_cap {
> -- 
> 2.37.2
>
Simon Ser Aug. 29, 2022, 4:01 p.m. UTC | #2
On Friday, August 26th, 2022 at 10:19, Ville Syrjälä <ville.syrjala@linux.intel.com> wrote:

> On Wed, Aug 24, 2022 at 03:08:55PM +0000, Simon Ser wrote:
> > This new kernel capability indicates whether async page-flips are
> > supported via the atomic uAPI. DRM clients can use it to check
> > for support before feeding DRM_MODE_PAGE_FLIP_ASYNC to the kernel.
> 
> I think we'd need to clarify the semantics of the async flag
> for atomic commits.
> 
> Eg. on Intel hw only pure page flips are possible async, if you do
> anything else (change plane size/pos/scaling/etc.) you will need
> to do a sync update. Technically not even all page flips (from the
> uapi POV) might be possible as the exact scanout source address
> is specified via two registers, only one of which can be update
> async. So technically the two framebuffers might be laid out
> just slightly differently which could prevent an async flip.
> Also only some subset of planes actually support async flips.

Also IIRC some format modifiers don't support async flip at all (CCS)?

> And on hw where multiple planes support it on the same crtc, only one
> plane can do it at a time. Well, more accurately we can only select
> one plane at a time to give us the "flip done" interrupt. I guess
> if the user wants to async flip multiple planes at the same time
> we could do them serially as opposed to in parallel to make sure
> all the flips actually happened before we signal completion of the
> entire commit. Async flips of multiple planes probably won't
> happen atomically anyway so doing them serially seems fine.
> 
> ATM in i915 we probably don't have sufficient state checks in
> place to catch all the restrictions, and instead in part we rely
> on the limited scope of the legacy async flip ioctl to make sure
> the operation doesn't attempt something the hw can't do.

Yeah, that makes sense.

In the documentation patch discussion [1], it appears it's not clear what
drivers should do when async flip isn't possible with the legacy uAPI.

For the atomic uAPI, we need to pick on of these two approaches:

1. Let the kernel fall back to a sync flip if async isn't possible. This
   simplifies user-space, but then user-space has no reliable way to figure out
   what really happened (sync or async?). That could be fixed with a new
   read-only CRTC prop indicating whether the last flip was async or not.
   However, maybe someone will come up in the future with user-space which
   needs to only apply an update if async flip is possible, in which case this
   approach falls short.
2. Make the kernel return EINVAL if async flip isn't possible. This adds more
   complexity to user-space, but enables a more reliable and deterministic
   uAPI. This is also more consistent with the rest of the existing atomic
   uAPI.

Note, current user-space would only need to opportunistically enable async
flip. IOW, I think that for current user-space plans "async if possible,
otherwise sync" is good enough. That behavior maps well to the Vulkan present
modes as well (which says that "this mode *may* result in visible tearing", but
doesn't guarantee it).

Another possible shortcoming of the proposed new uAPI here is that user-space
cannot submit a single atomic commit which updates multiple CRTCs, and
individually select which CRTC does an async flip. This could be fixed with
a "ASYNC_FLIP" CRTC prop which the kernel always resets to 0 on commit. I'm not
sure we want/need to cross that bridge right now, it would be easy enough to
add as a second step if some user-space would benefit from it.

What do you think?

[1]: https://lore.kernel.org/dri-devel/ASSNOUe9wtsXskZjVlf1X4pl53T7pVE0MfEzkQ_h4cX0tjnF7e3cxpwGpRNPudmIHoRuW4kz_v1AeTpXgouLpTYcI8q-lPTzc1YMLR8JiJM=@emersion.fr/
Ville Syrjälä Aug. 30, 2022, 8:08 a.m. UTC | #3
On Mon, Aug 29, 2022 at 04:01:44PM +0000, Simon Ser wrote:
> On Friday, August 26th, 2022 at 10:19, Ville Syrjälä <ville.syrjala@linux.intel.com> wrote:
> 
> > On Wed, Aug 24, 2022 at 03:08:55PM +0000, Simon Ser wrote:
> > > This new kernel capability indicates whether async page-flips are
> > > supported via the atomic uAPI. DRM clients can use it to check
> > > for support before feeding DRM_MODE_PAGE_FLIP_ASYNC to the kernel.
> > 
> > I think we'd need to clarify the semantics of the async flag
> > for atomic commits.
> > 
> > Eg. on Intel hw only pure page flips are possible async, if you do
> > anything else (change plane size/pos/scaling/etc.) you will need
> > to do a sync update. Technically not even all page flips (from the
> > uapi POV) might be possible as the exact scanout source address
> > is specified via two registers, only one of which can be update
> > async. So technically the two framebuffers might be laid out
> > just slightly differently which could prevent an async flip.
> > Also only some subset of planes actually support async flips.
> 
> Also IIRC some format modifiers don't support async flip at all (CCS)?

Yeah, that too. Also planar YUV formats aren't allowed.

> 
> > And on hw where multiple planes support it on the same crtc, only one
> > plane can do it at a time. Well, more accurately we can only select
> > one plane at a time to give us the "flip done" interrupt. I guess
> > if the user wants to async flip multiple planes at the same time
> > we could do them serially as opposed to in parallel to make sure
> > all the flips actually happened before we signal completion of the
> > entire commit. Async flips of multiple planes probably won't
> > happen atomically anyway so doing them serially seems fine.
> > 
> > ATM in i915 we probably don't have sufficient state checks in
> > place to catch all the restrictions, and instead in part we rely
> > on the limited scope of the legacy async flip ioctl to make sure
> > the operation doesn't attempt something the hw can't do.
> 
> Yeah, that makes sense.
> 
> In the documentation patch discussion [1], it appears it's not clear what
> drivers should do when async flip isn't possible with the legacy uAPI.
> 
> For the atomic uAPI, we need to pick on of these two approaches:
> 
> 1. Let the kernel fall back to a sync flip if async isn't possible. This
>    simplifies user-space, but then user-space has no reliable way to figure out
>    what really happened (sync or async?). That could be fixed with a new
>    read-only CRTC prop indicating whether the last flip was async or not.
>    However, maybe someone will come up in the future with user-space which
>    needs to only apply an update if async flip is possible, in which case this
>    approach falls short.
> 2. Make the kernel return EINVAL if async flip isn't possible. This adds more
>    complexity to user-space, but enables a more reliable and deterministic
>    uAPI. This is also more consistent with the rest of the existing atomic
>    uAPI.

The current behaviour is somewhat a combination of the two. We return
an error if async flip is not possible at all given the current state.

When async flip is possible we return success, but may still internally
fall back to a sync flip for the first flip. That is required on some
borked hardware that can't switch from sync flips to async flips without
doing an extra sync flip. Also on some other hardware we intentionally
fall back to a sync flip for the first async flip, so that we can
reprogram some display FIFO stuff (aimed to make the subsequent async
flips faster).

> 
> Note, current user-space would only need to opportunistically enable async
> flip. IOW, I think that for current user-space plans "async if possible,
> otherwise sync" is good enough. That behavior maps well to the Vulkan present
> modes as well (which says that "this mode *may* result in visible tearing", but
> doesn't guarantee it).

The current behaviour is to fall back to a blit if the async
flip fails. So you still get the same effective behaviour, just
not as efficient. I think that's a reasonable way to handle it.

> 
> Another possible shortcoming of the proposed new uAPI here is that user-space
> cannot submit a single atomic commit which updates multiple CRTCs, and
> individually select which CRTC does an async flip. This could be fixed with
> a "ASYNC_FLIP" CRTC prop which the kernel always resets to 0 on commit. I'm not
> sure we want/need to cross that bridge right now, it would be easy enough to
> add as a second step if some user-space would benefit from it.

Technically it should really be per-plane since that is what does
the flip. But I have a feeling that allowing a mix of async and
sync in the same commit is just going to make everything more
complicated without really helping anything (async flips won't
happen atomically anyway with anything else).

One (crazy?) idea I had for the atomic api is that we could even
reject most of the properties already on the uapi level before anyone
gets to examine the final state. Ie. as soon as the atomic ioctl sees
eg. a gamma LUT property change it would just immediately return
an error if async flip is also requested.

> 
> What do you think?
> 
> [1]: https://lore.kernel.org/dri-devel/ASSNOUe9wtsXskZjVlf1X4pl53T7pVE0MfEzkQ_h4cX0tjnF7e3cxpwGpRNPudmIHoRuW4kz_v1AeTpXgouLpTYcI8q-lPTzc1YMLR8JiJM=@emersion.fr/
Pekka Paalanen Aug. 30, 2022, 8:40 a.m. UTC | #4
On Tue, 30 Aug 2022 11:08:22 +0300
Ville Syrjälä <ville.syrjala@linux.intel.com> wrote:

> On Mon, Aug 29, 2022 at 04:01:44PM +0000, Simon Ser wrote:
> > On Friday, August 26th, 2022 at 10:19, Ville Syrjälä <ville.syrjala@linux.intel.com> wrote:
> >   
> > > On Wed, Aug 24, 2022 at 03:08:55PM +0000, Simon Ser wrote:  
> > > > This new kernel capability indicates whether async page-flips are
> > > > supported via the atomic uAPI. DRM clients can use it to check
> > > > for support before feeding DRM_MODE_PAGE_FLIP_ASYNC to the kernel.  
> > > 
> > > I think we'd need to clarify the semantics of the async flag
> > > for atomic commits.
> > > 
> > > Eg. on Intel hw only pure page flips are possible async, if you do
> > > anything else (change plane size/pos/scaling/etc.) you will need
> > > to do a sync update. Technically not even all page flips (from the
> > > uapi POV) might be possible as the exact scanout source address
> > > is specified via two registers, only one of which can be update
> > > async. So technically the two framebuffers might be laid out
> > > just slightly differently which could prevent an async flip.
> > > Also only some subset of planes actually support async flips.  
> > 
> > Also IIRC some format modifiers don't support async flip at all (CCS)?  
> 
> Yeah, that too. Also planar YUV formats aren't allowed.
> 
> >   
> > > And on hw where multiple planes support it on the same crtc, only one
> > > plane can do it at a time. Well, more accurately we can only select
> > > one plane at a time to give us the "flip done" interrupt. I guess
> > > if the user wants to async flip multiple planes at the same time
> > > we could do them serially as opposed to in parallel to make sure
> > > all the flips actually happened before we signal completion of the
> > > entire commit. Async flips of multiple planes probably won't
> > > happen atomically anyway so doing them serially seems fine.
> > > 
> > > ATM in i915 we probably don't have sufficient state checks in
> > > place to catch all the restrictions, and instead in part we rely
> > > on the limited scope of the legacy async flip ioctl to make sure
> > > the operation doesn't attempt something the hw can't do.  
> > 
> > Yeah, that makes sense.
> > 
> > In the documentation patch discussion [1], it appears it's not clear what
> > drivers should do when async flip isn't possible with the legacy uAPI.
> > 
> > For the atomic uAPI, we need to pick on of these two approaches:
> > 
> > 1. Let the kernel fall back to a sync flip if async isn't possible. This
> >    simplifies user-space, but then user-space has no reliable way to figure out
> >    what really happened (sync or async?). That could be fixed with a new
> >    read-only CRTC prop indicating whether the last flip was async or not.
> >    However, maybe someone will come up in the future with user-space which
> >    needs to only apply an update if async flip is possible, in which case this
> >    approach falls short.

There is the pageflip completion timestamp in the DRM event. If
userspace knows the phase and period of the scanout cycle, the
completion timestamp should tell quite reliably if the update was
tearing.

For the phase, one can query KMS for the last vblank timestamp. This
should work also for VRR I assume.

For the period, fixed-frequency video mode has it straight. VRR gives
only a range or a minimum period.

> > 2. Make the kernel return EINVAL if async flip isn't possible. This adds more
> >    complexity to user-space, but enables a more reliable and deterministic
> >    uAPI. This is also more consistent with the rest of the existing atomic
> >    uAPI.  
> 
> The current behaviour is somewhat a combination of the two. We return
> an error if async flip is not possible at all given the current state.
> 
> When async flip is possible we return success, but may still internally
> fall back to a sync flip for the first flip. That is required on some
> borked hardware that can't switch from sync flips to async flips without
> doing an extra sync flip. Also on some other hardware we intentionally
> fall back to a sync flip for the first async flip, so that we can
> reprogram some display FIFO stuff (aimed to make the subsequent async
> flips faster).

Oh, so userspace should expect to run async for long periods of time,
and not use async this frame, sync next, then async again depending on
per-frame timings.

That seems important to note.

It's almost like the async flag should be a KMS property instead of a
commit ioctl flag.

> > Note, current user-space would only need to opportunistically enable async
> > flip. IOW, I think that for current user-space plans "async if possible,
> > otherwise sync" is good enough. That behavior maps well to the Vulkan present
> > modes as well (which says that "this mode *may* result in visible tearing", but
> > doesn't guarantee it).  
> 
> The current behaviour is to fall back to a blit if the async
> flip fails. So you still get the same effective behaviour, just
> not as efficient. I think that's a reasonable way to handle it.

That's purely an Xorg thing, right?

Should Wayland compositors implement the same thing is a good question.

> > Another possible shortcoming of the proposed new uAPI here is that user-space
> > cannot submit a single atomic commit which updates multiple CRTCs, and
> > individually select which CRTC does an async flip. This could be fixed with

I would think that you can just do per-CRTC atomic commits in that
case. You would do per-CRTC atomic commits anyway when the vblanks do
not coincide. I expect tearing updates to have unpredictable latency,
especially if they can silently fall back to sync flips, so doing
multi-CRTC async flips is not useful.

> > a "ASYNC_FLIP" CRTC prop which the kernel always resets to 0 on commit. I'm not
> > sure we want/need to cross that bridge right now, it would be easy enough to
> > add as a second step if some user-space would benefit from it.  
> 
> Technically it should really be per-plane since that is what does
> the flip. But I have a feeling that allowing a mix of async and
> sync in the same commit is just going to make everything more
> complicated without really helping anything (async flips won't
> happen atomically anyway with anything else).
> 
> One (crazy?) idea I had for the atomic api is that we could even
> reject most of the properties already on the uapi level before anyone
> gets to examine the final state. Ie. as soon as the atomic ioctl sees
> eg. a gamma LUT property change it would just immediately return
> an error if async flip is also requested.

I agree with these two paragraphs.

What about limiting async flag to atomic commits that update only a
single KMS plane (regardless of how many planes are active)? Maybe that
would be a good first step?

> 
> > 
> > What do you think?
> > 
> > [1]: https://lore.kernel.org/dri-devel/ASSNOUe9wtsXskZjVlf1X4pl53T7pVE0MfEzkQ_h4cX0tjnF7e3cxpwGpRNPudmIHoRuW4kz_v1AeTpXgouLpTYcI8q-lPTzc1YMLR8JiJM=@emersion.fr/  
> 

Thanks,
pq
Michel Dänzer Aug. 30, 2022, 8:41 a.m. UTC | #5
On 2022-08-29 18:01, Simon Ser wrote:
> On Friday, August 26th, 2022 at 10:19, Ville Syrjälä <ville.syrjala@linux.intel.com> wrote:
>> On Wed, Aug 24, 2022 at 03:08:55PM +0000, Simon Ser wrote:
>>> This new kernel capability indicates whether async page-flips are
>>> supported via the atomic uAPI. DRM clients can use it to check
>>> for support before feeding DRM_MODE_PAGE_FLIP_ASYNC to the kernel.
>>
>> I think we'd need to clarify the semantics of the async flag
>> for atomic commits.
>>
>> Eg. on Intel hw only pure page flips are possible async, if you do
>> anything else (change plane size/pos/scaling/etc.) you will need
>> to do a sync update. Technically not even all page flips (from the
>> uapi POV) might be possible as the exact scanout source address
>> is specified via two registers, only one of which can be update
>> async. So technically the two framebuffers might be laid out
>> just slightly differently which could prevent an async flip.
>> Also only some subset of planes actually support async flips.
> 
> Also IIRC some format modifiers don't support async flip at all (CCS)?
> 
>> And on hw where multiple planes support it on the same crtc, only one
>> plane can do it at a time. Well, more accurately we can only select
>> one plane at a time to give us the "flip done" interrupt. I guess
>> if the user wants to async flip multiple planes at the same time
>> we could do them serially as opposed to in parallel to make sure
>> all the flips actually happened before we signal completion of the
>> entire commit. Async flips of multiple planes probably won't
>> happen atomically anyway so doing them serially seems fine.
>>
>> ATM in i915 we probably don't have sufficient state checks in
>> place to catch all the restrictions, and instead in part we rely
>> on the limited scope of the legacy async flip ioctl to make sure
>> the operation doesn't attempt something the hw can't do.
> 
> Yeah, that makes sense.
> 
> In the documentation patch discussion [1], it appears it's not clear what
> drivers should do when async flip isn't possible with the legacy uAPI.
> 
> For the atomic uAPI, we need to pick on of these two approaches:
> 
> 1. Let the kernel fall back to a sync flip if async isn't possible. This
>    simplifies user-space, but then user-space has no reliable way to figure out
>    what really happened (sync or async?). That could be fixed with a new
>    read-only CRTC prop indicating whether the last flip was async or not.
>    However, maybe someone will come up in the future with user-space which
>    needs to only apply an update if async flip is possible, in which case this
>    approach falls short.

The future is now. :)

As I described in the documentation patch discussion, this approach would make it tricky for a Wayland compositor to decide if it should use an async commit (which needs to be done ASAP to serve the intended purpose) or not (in which case the compositor may want to delay the commit as long as possible for minimal latency).


> Another possible shortcoming of the proposed new uAPI here is that user-space
> cannot submit a single atomic commit which updates multiple CRTCs, and
> individually select which CRTC does an async flip. This could be fixed with
> a "ASYNC_FLIP" CRTC prop which the kernel always resets to 0 on commit. I'm not
> sure we want/need to cross that bridge right now, it would be easy enough to
> add as a second step if some user-space would benefit from it.

I thought about this as well, but I came to the conclusion it shouldn't be needed. User space can do one commit for the sync CRTC/s planes first and another commit for the async ones afterwards.
Ville Syrjälä Aug. 30, 2022, 10:24 a.m. UTC | #6
On Tue, Aug 30, 2022 at 11:40:10AM +0300, Pekka Paalanen wrote:
> On Tue, 30 Aug 2022 11:08:22 +0300
> Ville Syrjälä <ville.syrjala@linux.intel.com> wrote:
> 
> > On Mon, Aug 29, 2022 at 04:01:44PM +0000, Simon Ser wrote:
> > > On Friday, August 26th, 2022 at 10:19, Ville Syrjälä <ville.syrjala@linux.intel.com> wrote:
> > >   
> > > > On Wed, Aug 24, 2022 at 03:08:55PM +0000, Simon Ser wrote:  
> > > > > This new kernel capability indicates whether async page-flips are
> > > > > supported via the atomic uAPI. DRM clients can use it to check
> > > > > for support before feeding DRM_MODE_PAGE_FLIP_ASYNC to the kernel.  
> > > > 
> > > > I think we'd need to clarify the semantics of the async flag
> > > > for atomic commits.
> > > > 
> > > > Eg. on Intel hw only pure page flips are possible async, if you do
> > > > anything else (change plane size/pos/scaling/etc.) you will need
> > > > to do a sync update. Technically not even all page flips (from the
> > > > uapi POV) might be possible as the exact scanout source address
> > > > is specified via two registers, only one of which can be update
> > > > async. So technically the two framebuffers might be laid out
> > > > just slightly differently which could prevent an async flip.
> > > > Also only some subset of planes actually support async flips.  
> > > 
> > > Also IIRC some format modifiers don't support async flip at all (CCS)?  
> > 
> > Yeah, that too. Also planar YUV formats aren't allowed.
> > 
> > >   
> > > > And on hw where multiple planes support it on the same crtc, only one
> > > > plane can do it at a time. Well, more accurately we can only select
> > > > one plane at a time to give us the "flip done" interrupt. I guess
> > > > if the user wants to async flip multiple planes at the same time
> > > > we could do them serially as opposed to in parallel to make sure
> > > > all the flips actually happened before we signal completion of the
> > > > entire commit. Async flips of multiple planes probably won't
> > > > happen atomically anyway so doing them serially seems fine.
> > > > 
> > > > ATM in i915 we probably don't have sufficient state checks in
> > > > place to catch all the restrictions, and instead in part we rely
> > > > on the limited scope of the legacy async flip ioctl to make sure
> > > > the operation doesn't attempt something the hw can't do.  
> > > 
> > > Yeah, that makes sense.
> > > 
> > > In the documentation patch discussion [1], it appears it's not clear what
> > > drivers should do when async flip isn't possible with the legacy uAPI.
> > > 
> > > For the atomic uAPI, we need to pick on of these two approaches:
> > > 
> > > 1. Let the kernel fall back to a sync flip if async isn't possible. This
> > >    simplifies user-space, but then user-space has no reliable way to figure out
> > >    what really happened (sync or async?). That could be fixed with a new
> > >    read-only CRTC prop indicating whether the last flip was async or not.
> > >    However, maybe someone will come up in the future with user-space which
> > >    needs to only apply an update if async flip is possible, in which case this
> > >    approach falls short.
> 
> There is the pageflip completion timestamp in the DRM event. If
> userspace knows the phase and period of the scanout cycle, the
> completion timestamp should tell quite reliably if the update was
> tearing.
> 
> For the phase, one can query KMS for the last vblank timestamp. This
> should work also for VRR I assume.
> 
> For the period, fixed-frequency video mode has it straight. VRR gives
> only a range or a minimum period.
> 
> > > 2. Make the kernel return EINVAL if async flip isn't possible. This adds more
> > >    complexity to user-space, but enables a more reliable and deterministic
> > >    uAPI. This is also more consistent with the rest of the existing atomic
> > >    uAPI.  
> > 
> > The current behaviour is somewhat a combination of the two. We return
> > an error if async flip is not possible at all given the current state.
> > 
> > When async flip is possible we return success, but may still internally
> > fall back to a sync flip for the first flip. That is required on some
> > borked hardware that can't switch from sync flips to async flips without
> > doing an extra sync flip. Also on some other hardware we intentionally
> > fall back to a sync flip for the first async flip, so that we can
> > reprogram some display FIFO stuff (aimed to make the subsequent async
> > flips faster).
> 
> Oh, so userspace should expect to run async for long periods of time,
> and not use async this frame, sync next, then async again depending on
> per-frame timings.
> 
> That seems important to note.

Yeah, our hw can't really do the "tear if late, otherwise do a sync flip"
behaviour. GLX_swap_control_tear I think was the glx extension for that.

> 
> It's almost like the async flag should be a KMS property instead of a
> commit ioctl flag.

I guess that would be one way to implement it.

> 
> > > Note, current user-space would only need to opportunistically enable async
> > > flip. IOW, I think that for current user-space plans "async if possible,
> > > otherwise sync" is good enough. That behavior maps well to the Vulkan present
> > > modes as well (which says that "this mode *may* result in visible tearing", but
> > > doesn't guarantee it).  
> > 
> > The current behaviour is to fall back to a blit if the async
> > flip fails. So you still get the same effective behaviour, just
> > not as efficient. I think that's a reasonable way to handle it.
> 
> That's purely an Xorg thing, right?

Yes.

> 
> Should Wayland compositors implement the same thing is a good question.

Is the whole tear+wayland situation more or less up in the air still?

> 
> > > Another possible shortcoming of the proposed new uAPI here is that user-space
> > > cannot submit a single atomic commit which updates multiple CRTCs, and
> > > individually select which CRTC does an async flip. This could be fixed with
> 
> I would think that you can just do per-CRTC atomic commits in that
> case. You would do per-CRTC atomic commits anyway when the vblanks do
> not coincide. I expect tearing updates to have unpredictable latency,
> especially if they can silently fall back to sync flips, so doing
> multi-CRTC async flips is not useful.
> 
> > > a "ASYNC_FLIP" CRTC prop which the kernel always resets to 0 on commit. I'm not
> > > sure we want/need to cross that bridge right now, it would be easy enough to
> > > add as a second step if some user-space would benefit from it.  
> > 
> > Technically it should really be per-plane since that is what does
> > the flip. But I have a feeling that allowing a mix of async and
> > sync in the same commit is just going to make everything more
> > complicated without really helping anything (async flips won't
> > happen atomically anyway with anything else).
> > 
> > One (crazy?) idea I had for the atomic api is that we could even
> > reject most of the properties already on the uapi level before anyone
> > gets to examine the final state. Ie. as soon as the atomic ioctl sees
> > eg. a gamma LUT property change it would just immediately return
> > an error if async flip is also requested.
> 
> I agree with these two paragraphs.
> 
> What about limiting async flag to atomic commits that update only a
> single KMS plane (regardless of how many planes are active)? Maybe that
> would be a good first step?

Sure, that might help a bit. But I'm rather more worried about
all the state that you can now include in the atomic commit but
aren't actually allowed to change. Eg. probably all crtc/connector
properties, and most plane properties. The legacy ioctl blocks most
of that implicitly, but the atomic ioctl does not. Hence my idea
about rejecting basically everything except the plane fb property
changes already at the ioctl level. Would avoid that stuff
leaking in by accident if the driver forgets to check absolutely
everything.
Simon Ser Aug. 30, 2022, 12:33 p.m. UTC | #7
On Tuesday, August 30th, 2022 at 10:08, Ville Syrjälä <ville.syrjala@linux.intel.com> wrote:

> > In the documentation patch discussion [1], it appears it's not clear what
> > drivers should do when async flip isn't possible with the legacy uAPI.
> >
> > For the atomic uAPI, we need to pick on of these two approaches:
> >
> > 1. Let the kernel fall back to a sync flip if async isn't possible. This
> >    simplifies user-space, but then user-space has no reliable way to figure out
> >    what really happened (sync or async?). That could be fixed with a new
> >    read-only CRTC prop indicating whether the last flip was async or not.
> >    However, maybe someone will come up in the future with user-space which
> >    needs to only apply an update if async flip is possible, in which case this
> >    approach falls short.
> > 2. Make the kernel return EINVAL if async flip isn't possible. This adds more
> >    complexity to user-space, but enables a more reliable and deterministic
> >    uAPI. This is also more consistent with the rest of the existing atomic
> >    uAPI.
> 
> The current behaviour is somewhat a combination of the two. We return
> an error if async flip is not possible at all given the current state.
> 
> When async flip is possible we return success, but may still internally
> fall back to a sync flip for the first flip. That is required on some
> borked hardware that can't switch from sync flips to async flips without
> doing an extra sync flip. Also on some other hardware we intentionally
> fall back to a sync flip for the first async flip, so that we can
> reprogram some display FIFO stuff (aimed to make the subsequent async
> flips faster).

Hm. Would it be possible for the atomic uAPI to return EINVAL in this case too,
to let user-space know what really happened? I suppose user-space could then
(mistakenly) assume that async flip is never possible, and never try again?

> > Note, current user-space would only need to opportunistically enable async
> > flip. IOW, I think that for current user-space plans "async if possible,
> > otherwise sync" is good enough. That behavior maps well to the Vulkan present
> > modes as well (which says that "this mode *may* result in visible tearing", but
> > doesn't guarantee it).
> 
> The current behaviour is to fall back to a blit if the async
> flip fails. So you still get the same effective behaviour, just
> not as efficient. I think that's a reasonable way to handle it.

After some discussion on IRC: the above describes Xorg's behavior.

To reproduce this behavior with the atomic uAPI, it is necessary to use
approach (2): make the atomic commit fail if async flip isn't possible, to let
user-space fall back to a blit.

> > Another possible shortcoming of the proposed new uAPI here is that user-space
> > cannot submit a single atomic commit which updates multiple CRTCs, and
> > individually select which CRTC does an async flip. This could be fixed with
> > a "ASYNC_FLIP" CRTC prop which the kernel always resets to 0 on commit. I'm not
> > sure we want/need to cross that bridge right now, it would be easy enough to
> > add as a second step if some user-space would benefit from it.
> 
> Technically it should really be per-plane since that is what does
> the flip. But I have a feeling that allowing a mix of async and
> sync in the same commit is just going to make everything more
> complicated without really helping anything (async flips won't
> happen atomically anyway with anything else).

Agreed.

> One (crazy?) idea I had for the atomic api is that we could even
> reject most of the properties already on the uapi level before anyone
> gets to examine the final state. Ie. as soon as the atomic ioctl sees
> eg. a gamma LUT property change it would just immediately return
> an error if async flip is also requested.

Hm, I guess... Although amdgpu doesn't really need this, it already has all
of the logic checking for stuff like gamma LUT property change + async. Could
maybe be a DRM helper if other drivers need it.
Simon Ser Aug. 30, 2022, 12:40 p.m. UTC | #8
On Tuesday, August 30th, 2022 at 12:24, Ville Syrjälä <ville.syrjala@linux.intel.com> wrote:

> > > The current behaviour is to fall back to a blit if the async
> > > flip fails. So you still get the same effective behaviour, just
> > > not as efficient. I think that's a reasonable way to handle it.
> >
> > That's purely an Xorg thing, right?
> 
> Yes.
> 
> >
> > Should Wayland compositors implement the same thing is a good question.
> 
> Is the whole tear+wayland situation more or less up in the air still?

The goal of the patches I'm sending is to allow Wayland compositors to request
tearing. We also have a WIP protocol extension for this [1].

[1]: https://gitlab.freedesktop.org/wayland/wayland-protocols/-/merge_requests/65
Simon Ser Aug. 30, 2022, 12:58 p.m. UTC | #9
(Oops, I replied to the wrong thread. Re-sending to the correct one.)

On Tuesday, August 30th, 2022 at 10:41, Michel Dänzer <michel.daenzer@mailbox.org> wrote:

> > For the atomic uAPI, we need to pick on of these two approaches:
> >
> > 1. Let the kernel fall back to a sync flip if async isn't possible. This
> >    simplifies user-space, but then user-space has no reliable way to figure out
> >    what really happened (sync or async?). That could be fixed with a new
> >    read-only CRTC prop indicating whether the last flip was async or not.
> >    However, maybe someone will come up in the future with user-space which
> >    needs to only apply an update if async flip is possible, in which case this
> >    approach falls short.
>
> The future is now. :)
>
> As I described in the documentation patch discussion, this approach would
> make it tricky for a Wayland compositor to decide if it should use an async
> commit (which needs to be done ASAP to serve the intended purpose) or not (in
> which case the compositor may want to delay the commit as long as possible
> for minimal latency).

Ah right. If an async page-flip is not possible, then a Wayland compositor
could want to wait the "last moment" before the next vblank to see if a more
up-to-date buffer is available.

With that + Xorg usage, I think we have a rather solid case for failing async
flips rather than silently falling back to vsync.
diff mbox series

Patch

diff --git a/drivers/gpu/drm/drm_ioctl.c b/drivers/gpu/drm/drm_ioctl.c
index ca2a6e6101dc..5b1591e2b46c 100644
--- a/drivers/gpu/drm/drm_ioctl.c
+++ b/drivers/gpu/drm/drm_ioctl.c
@@ -302,6 +302,11 @@  static int drm_getcap(struct drm_device *dev, void *data, struct drm_file *file_
 	case DRM_CAP_CRTC_IN_VBLANK_EVENT:
 		req->value = 1;
 		break;
+	case DRM_CAP_ATOMIC_ASYNC_PAGE_FLIP:
+		req->value = drm_core_check_feature(dev, DRIVER_ATOMIC) &&
+			     dev->mode_config.async_page_flip &&
+			     !dev->mode_config.atomic_async_page_flip_not_supported;
+		break;
 	default:
 		return -EINVAL;
 	}
diff --git a/include/uapi/drm/drm.h b/include/uapi/drm/drm.h
index 642808520d92..b1962628ecda 100644
--- a/include/uapi/drm/drm.h
+++ b/include/uapi/drm/drm.h
@@ -706,7 +706,8 @@  struct drm_gem_open {
 /**
  * DRM_CAP_ASYNC_PAGE_FLIP
  *
- * If set to 1, the driver supports &DRM_MODE_PAGE_FLIP_ASYNC.
+ * If set to 1, the driver supports &DRM_MODE_PAGE_FLIP_ASYNC for legacy
+ * page-flips.
  */
 #define DRM_CAP_ASYNC_PAGE_FLIP		0x7
 /**
@@ -767,6 +768,13 @@  struct drm_gem_open {
  * Documentation/gpu/drm-mm.rst, section "DRM Sync Objects".
  */
 #define DRM_CAP_SYNCOBJ_TIMELINE	0x14
+/**
+ * DRM_CAP_ATOMIC_ASYNC_PAGE_FLIP
+ *
+ * If set to 1, the driver supports &DRM_MODE_PAGE_FLIP_ASYNC for atomic
+ * commits.
+ */
+#define DRM_CAP_ATOMIC_ASYNC_PAGE_FLIP	0x15
 
 /* DRM_IOCTL_GET_CAP ioctl argument type */
 struct drm_get_cap {