[v4,06/14] dma-buf/sync_file: Support (E)POLLPRI

Message ID	20230218211608.1630586-7-robdclark@gmail.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-media-owner@vger.kernel.org> From: Rob Clark <robdclark@gmail.com> To: dri-devel@lists.freedesktop.org Cc: freedreno@lists.freedesktop.org, Daniel Vetter <daniel@ffwll.ch>, =?utf-8?q?Christian_K=C3=B6nig?= <ckoenig.leichtzumerken@gmail.com>, =?utf-8?q?Michel_D=C3=A4nzer?= <michel@daenzer.net>, Tvrtko Ursulin <tvrtko.ursulin@intel.com>, Rodrigo Vivi <rodrigo.vivi@intel.com>, Alex Deucher <alexander.deucher@amd.com>, Pekka Paalanen <ppaalanen@gmail.com>, Simon Ser <contact@emersion.fr>, Rob Clark <robdclark@chromium.org>, Sumit Semwal <sumit.semwal@linaro.org>, Gustavo Padovan <gustavo@padovan.org>, =?utf-8?q?Christian_K=C3=B6nig?= <christian.koenig@amd.com>, linux-media@vger.kernel.org (open list:SYNC FILE FRAMEWORK), linaro-mm-sig@lists.linaro.org (moderated list:DMA BUFFER SHARING FRAMEWORK), linux-kernel@vger.kernel.org (open list) Subject: [PATCH v4 06/14] dma-buf/sync_file: Support (E)POLLPRI Date: Sat, 18 Feb 2023 13:15:49 -0800 Message-Id: <20230218211608.1630586-7-robdclark@gmail.com> In-Reply-To: <20230218211608.1630586-1-robdclark@gmail.com> References: <20230218211608.1630586-1-robdclark@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	dma-fence: Deadline awareness \| expand [v4,00/14] dma-fence: Deadline awareness [v4,01/14] dma-buf/dma-fence: Add deadline awareness [v4,02/14] dma-buf/fence-array: Add fence deadline support [v4,03/14] dma-buf/fence-chain: Add fence deadline support [v4,04/14] dma-buf/dma-resv: Add a way to set fence deadline [v4,05/14] dma-buf/sync_file: Add SET_DEADLINE ioctl [v4,06/14] dma-buf/sync_file: Support (E)POLLPRI [v4,07/14] dma-buf/sw_sync: Add fence deadline support [v4,08/14] drm/scheduler: Add fence deadline support [v4,12/14] drm/msm: Add deadline based boost support [v4,14/14] drm/i915: Add deadline based boost support

Rob Clark Feb. 18, 2023, 9:15 p.m. UTC

From: Rob Clark <robdclark@chromium.org>

Allow userspace to use the EPOLLPRI/POLLPRI flag to indicate an urgent
wait (as opposed to a "housekeeping" wait to know when to cleanup after
some work has completed).  Usermode components of GPU driver stacks
often poll() on fence fd's to know when it is safe to do things like
free or reuse a buffer, but they can also poll() on a fence fd when
waiting to read back results from the GPU.  The EPOLLPRI/POLLPRI flag
lets the kernel differentiate these two cases.

Signed-off-by: Rob Clark <robdclark@chromium.org>
---
 drivers/dma-buf/sync_file.c | 8 ++++++++
 1 file changed, 8 insertions(+)

Christian König Feb. 20, 2023, 8:31 a.m. UTC | #1

Am 18.02.23 um 22:15 schrieb Rob Clark:
> From: Rob Clark <robdclark@chromium.org>
>
> Allow userspace to use the EPOLLPRI/POLLPRI flag to indicate an urgent
> wait (as opposed to a "housekeeping" wait to know when to cleanup after
> some work has completed).  Usermode components of GPU driver stacks
> often poll() on fence fd's to know when it is safe to do things like
> free or reuse a buffer, but they can also poll() on a fence fd when
> waiting to read back results from the GPU.  The EPOLLPRI/POLLPRI flag
> lets the kernel differentiate these two cases.
>
> Signed-off-by: Rob Clark <robdclark@chromium.org>

The code looks clean, but the different poll flags and their meaning are 
certainly not my field of expertise.

Feel free to add Acked-by: Christian König <christian.koenig@amd.com>, 
somebody with more background in this should probably take a look as well.

Regards,
Christian.

> ---
>   drivers/dma-buf/sync_file.c | 8 ++++++++
>   1 file changed, 8 insertions(+)
>
> diff --git a/drivers/dma-buf/sync_file.c b/drivers/dma-buf/sync_file.c
> index fb6ca1032885..c30b2085ee0a 100644
> --- a/drivers/dma-buf/sync_file.c
> +++ b/drivers/dma-buf/sync_file.c
> @@ -192,6 +192,14 @@ static __poll_t sync_file_poll(struct file *file, poll_table *wait)
>   {
>   	struct sync_file *sync_file = file->private_data;
>   
> +	/*
> +	 * The POLLPRI/EPOLLPRI flag can be used to signal that
> +	 * userspace wants the fence to signal ASAP, express this
> +	 * as an immediate deadline.
> +	 */
> +	if (poll_requested_events(wait) & EPOLLPRI)
> +		dma_fence_set_deadline(sync_file->fence, ktime_get());
> +
>   	poll_wait(file, &sync_file->wq, wait);
>   
>   	if (list_empty(&sync_file->cb.node) &&

Pekka Paalanen Feb. 20, 2023, 8:53 a.m. UTC | #2

On Sat, 18 Feb 2023 13:15:49 -0800
Rob Clark <robdclark@gmail.com> wrote:

> From: Rob Clark <robdclark@chromium.org>
> 
> Allow userspace to use the EPOLLPRI/POLLPRI flag to indicate an urgent
> wait (as opposed to a "housekeeping" wait to know when to cleanup after
> some work has completed).  Usermode components of GPU driver stacks
> often poll() on fence fd's to know when it is safe to do things like
> free or reuse a buffer, but they can also poll() on a fence fd when
> waiting to read back results from the GPU.  The EPOLLPRI/POLLPRI flag
> lets the kernel differentiate these two cases.
> 
> Signed-off-by: Rob Clark <robdclark@chromium.org>

Hi,

where would the UAPI documentation of this go?
It seems to be missing.

If a Wayland compositor is polling application fences to know which
client buffer to use in its rendering, should the compositor poll with
PRI or not? If a compositor polls with PRI, then all fences from all
applications would always be PRI. Would that be harmful somehow or
would it be beneficial?


Thanks,
pq

> ---
>  drivers/dma-buf/sync_file.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/drivers/dma-buf/sync_file.c b/drivers/dma-buf/sync_file.c
> index fb6ca1032885..c30b2085ee0a 100644
> --- a/drivers/dma-buf/sync_file.c
> +++ b/drivers/dma-buf/sync_file.c
> @@ -192,6 +192,14 @@ static __poll_t sync_file_poll(struct file *file, poll_table *wait)
>  {
>  	struct sync_file *sync_file = file->private_data;
>  
> +	/*
> +	 * The POLLPRI/EPOLLPRI flag can be used to signal that
> +	 * userspace wants the fence to signal ASAP, express this
> +	 * as an immediate deadline.
> +	 */
> +	if (poll_requested_events(wait) & EPOLLPRI)
> +		dma_fence_set_deadline(sync_file->fence, ktime_get());
> +
>  	poll_wait(file, &sync_file->wq, wait);
>  
>  	if (list_empty(&sync_file->cb.node) &&

Rob Clark Feb. 20, 2023, 4:14 p.m. UTC | #3

On Mon, Feb 20, 2023 at 12:53 AM Pekka Paalanen <ppaalanen@gmail.com> wrote:
>
> On Sat, 18 Feb 2023 13:15:49 -0800
> Rob Clark <robdclark@gmail.com> wrote:
>
> > From: Rob Clark <robdclark@chromium.org>
> >
> > Allow userspace to use the EPOLLPRI/POLLPRI flag to indicate an urgent
> > wait (as opposed to a "housekeeping" wait to know when to cleanup after
> > some work has completed).  Usermode components of GPU driver stacks
> > often poll() on fence fd's to know when it is safe to do things like
> > free or reuse a buffer, but they can also poll() on a fence fd when
> > waiting to read back results from the GPU.  The EPOLLPRI/POLLPRI flag
> > lets the kernel differentiate these two cases.
> >
> > Signed-off-by: Rob Clark <robdclark@chromium.org>
>
> Hi,
>
> where would the UAPI documentation of this go?
> It seems to be missing.

Good question, I am not sure.  The poll() man page has a description,
but my usage doesn't fit that _exactly_ (but OTOH the description is a
bit vague).

> If a Wayland compositor is polling application fences to know which
> client buffer to use in its rendering, should the compositor poll with
> PRI or not? If a compositor polls with PRI, then all fences from all
> applications would always be PRI. Would that be harmful somehow or
> would it be beneficial?

I think a compositor would rather use the deadline ioctl and then poll
without PRI.  Otherwise you are giving an urgency signal to the fence
signaller which might not necessarily be needed.

The places where I expect PRI to be useful is more in mesa (things
like glFinish(), readpix, and other similar sorts of blocking APIs)

BR,
-R

>
>
> Thanks,
> pq
>
> > ---
> >  drivers/dma-buf/sync_file.c | 8 ++++++++
> >  1 file changed, 8 insertions(+)
> >
> > diff --git a/drivers/dma-buf/sync_file.c b/drivers/dma-buf/sync_file.c
> > index fb6ca1032885..c30b2085ee0a 100644
> > --- a/drivers/dma-buf/sync_file.c
> > +++ b/drivers/dma-buf/sync_file.c
> > @@ -192,6 +192,14 @@ static __poll_t sync_file_poll(struct file *file, poll_table *wait)
> >  {
> >       struct sync_file *sync_file = file->private_data;
> >
> > +     /*
> > +      * The POLLPRI/EPOLLPRI flag can be used to signal that
> > +      * userspace wants the fence to signal ASAP, express this
> > +      * as an immediate deadline.
> > +      */
> > +     if (poll_requested_events(wait) & EPOLLPRI)
> > +             dma_fence_set_deadline(sync_file->fence, ktime_get());
> > +
> >       poll_wait(file, &sync_file->wq, wait);
> >
> >       if (list_empty(&sync_file->cb.node) &&
>

Pekka Paalanen Feb. 21, 2023, 8:37 a.m. UTC | #4

On Mon, 20 Feb 2023 08:14:47 -0800
Rob Clark <robdclark@gmail.com> wrote:

> On Mon, Feb 20, 2023 at 12:53 AM Pekka Paalanen <ppaalanen@gmail.com> wrote:
> >
> > On Sat, 18 Feb 2023 13:15:49 -0800
> > Rob Clark <robdclark@gmail.com> wrote:
> >  
> > > From: Rob Clark <robdclark@chromium.org>
> > >
> > > Allow userspace to use the EPOLLPRI/POLLPRI flag to indicate an urgent
> > > wait (as opposed to a "housekeeping" wait to know when to cleanup after
> > > some work has completed).  Usermode components of GPU driver stacks
> > > often poll() on fence fd's to know when it is safe to do things like
> > > free or reuse a buffer, but they can also poll() on a fence fd when
> > > waiting to read back results from the GPU.  The EPOLLPRI/POLLPRI flag
> > > lets the kernel differentiate these two cases.
> > >
> > > Signed-off-by: Rob Clark <robdclark@chromium.org>  
> >
> > Hi,
> >
> > where would the UAPI documentation of this go?
> > It seems to be missing.  
> 
> Good question, I am not sure.  The poll() man page has a description,
> but my usage doesn't fit that _exactly_ (but OTOH the description is a
> bit vague).
> 
> > If a Wayland compositor is polling application fences to know which
> > client buffer to use in its rendering, should the compositor poll with
> > PRI or not? If a compositor polls with PRI, then all fences from all
> > applications would always be PRI. Would that be harmful somehow or
> > would it be beneficial?  
> 
> I think a compositor would rather use the deadline ioctl and then poll
> without PRI.  Otherwise you are giving an urgency signal to the fence
> signaller which might not necessarily be needed.
> 
> The places where I expect PRI to be useful is more in mesa (things
> like glFinish(), readpix, and other similar sorts of blocking APIs)

Sounds good. Docs... ;-)

Hmm, so a compositor should set the deadline when it processes the
wl_surface.commit, and not when it actually starts repainting, to give
time for the driver to react and the GPU to do some more work. The
deadline would be the time when the compositor starts its repaint, so
it knows if the buffer is ready or not.


Thanks,
pq


> 
> BR,
> -R
> 
> >
> >
> > Thanks,
> > pq
> >  
> > > ---
> > >  drivers/dma-buf/sync_file.c | 8 ++++++++
> > >  1 file changed, 8 insertions(+)
> > >
> > > diff --git a/drivers/dma-buf/sync_file.c b/drivers/dma-buf/sync_file.c
> > > index fb6ca1032885..c30b2085ee0a 100644
> > > --- a/drivers/dma-buf/sync_file.c
> > > +++ b/drivers/dma-buf/sync_file.c
> > > @@ -192,6 +192,14 @@ static __poll_t sync_file_poll(struct file *file, poll_table *wait)
> > >  {
> > >       struct sync_file *sync_file = file->private_data;
> > >
> > > +     /*
> > > +      * The POLLPRI/EPOLLPRI flag can be used to signal that
> > > +      * userspace wants the fence to signal ASAP, express this
> > > +      * as an immediate deadline.
> > > +      */
> > > +     if (poll_requested_events(wait) & EPOLLPRI)
> > > +             dma_fence_set_deadline(sync_file->fence, ktime_get());
> > > +
> > >       poll_wait(file, &sync_file->wq, wait);
> > >
> > >       if (list_empty(&sync_file->cb.node) &&  
> >

Pekka Paalanen Feb. 21, 2023, 8:38 a.m. UTC | #5

On Mon, 20 Feb 2023 09:31:41 +0100
Christian König <christian.koenig@amd.com> wrote:

> Am 18.02.23 um 22:15 schrieb Rob Clark:
> > From: Rob Clark <robdclark@chromium.org>
> >
> > Allow userspace to use the EPOLLPRI/POLLPRI flag to indicate an urgent
> > wait (as opposed to a "housekeeping" wait to know when to cleanup after
> > some work has completed).  Usermode components of GPU driver stacks
> > often poll() on fence fd's to know when it is safe to do things like
> > free or reuse a buffer, but they can also poll() on a fence fd when
> > waiting to read back results from the GPU.  The EPOLLPRI/POLLPRI flag
> > lets the kernel differentiate these two cases.
> >
> > Signed-off-by: Rob Clark <robdclark@chromium.org>  
> 
> The code looks clean, but the different poll flags and their meaning are 
> certainly not my field of expertise.


A good question. epoll_ctl manual refers to poll(2) which says:

       POLLPRI
              There is some exceptional condition on the file descriptor.  Possibilities include:

              • There is out-of-band data on a TCP socket (see tcp(7)).

              • A pseudoterminal master in packet mode has seen a state change on the slave (see ioctl_tty(2)).

              • A cgroup.events file has been modified (see cgroups(7)).

It seems to be about selecting what events will trigger the poll,
more than how (fast) to poll. At least it is not documented to be
ignored in 'events', so I guess it should work.


Thanks,
pq

> Feel free to add Acked-by: Christian König <christian.koenig@amd.com>, 
> somebody with more background in this should probably take a look as well.
> 
> Regards,
> Christian.
> 
> > ---
> >   drivers/dma-buf/sync_file.c | 8 ++++++++
> >   1 file changed, 8 insertions(+)
> >
> > diff --git a/drivers/dma-buf/sync_file.c b/drivers/dma-buf/sync_file.c
> > index fb6ca1032885..c30b2085ee0a 100644
> > --- a/drivers/dma-buf/sync_file.c
> > +++ b/drivers/dma-buf/sync_file.c
> > @@ -192,6 +192,14 @@ static __poll_t sync_file_poll(struct file *file, poll_table *wait)
> >   {
> >   	struct sync_file *sync_file = file->private_data;
> >   
> > +	/*
> > +	 * The POLLPRI/EPOLLPRI flag can be used to signal that
> > +	 * userspace wants the fence to signal ASAP, express this
> > +	 * as an immediate deadline.
> > +	 */
> > +	if (poll_requested_events(wait) & EPOLLPRI)
> > +		dma_fence_set_deadline(sync_file->fence, ktime_get());
> > +
> >   	poll_wait(file, &sync_file->wq, wait);
> >   
> >   	if (list_empty(&sync_file->cb.node) &&  
>

Sebastian Wick Feb. 21, 2023, 4:01 p.m. UTC | #6

On Tue, Feb 21, 2023 at 9:38 AM Pekka Paalanen <ppaalanen@gmail.com> wrote:
>
> On Mon, 20 Feb 2023 08:14:47 -0800
> Rob Clark <robdclark@gmail.com> wrote:
>
> > On Mon, Feb 20, 2023 at 12:53 AM Pekka Paalanen <ppaalanen@gmail.com> wrote:
> > >
> > > On Sat, 18 Feb 2023 13:15:49 -0800
> > > Rob Clark <robdclark@gmail.com> wrote:
> > >
> > > > From: Rob Clark <robdclark@chromium.org>
> > > >
> > > > Allow userspace to use the EPOLLPRI/POLLPRI flag to indicate an urgent
> > > > wait (as opposed to a "housekeeping" wait to know when to cleanup after
> > > > some work has completed).  Usermode components of GPU driver stacks
> > > > often poll() on fence fd's to know when it is safe to do things like
> > > > free or reuse a buffer, but they can also poll() on a fence fd when
> > > > waiting to read back results from the GPU.  The EPOLLPRI/POLLPRI flag
> > > > lets the kernel differentiate these two cases.
> > > >
> > > > Signed-off-by: Rob Clark <robdclark@chromium.org>
> > >
> > > Hi,
> > >
> > > where would the UAPI documentation of this go?
> > > It seems to be missing.
> >
> > Good question, I am not sure.  The poll() man page has a description,
> > but my usage doesn't fit that _exactly_ (but OTOH the description is a
> > bit vague).
> >
> > > If a Wayland compositor is polling application fences to know which
> > > client buffer to use in its rendering, should the compositor poll with
> > > PRI or not? If a compositor polls with PRI, then all fences from all
> > > applications would always be PRI. Would that be harmful somehow or
> > > would it be beneficial?
> >
> > I think a compositor would rather use the deadline ioctl and then poll
> > without PRI.  Otherwise you are giving an urgency signal to the fence
> > signaller which might not necessarily be needed.
> >
> > The places where I expect PRI to be useful is more in mesa (things
> > like glFinish(), readpix, and other similar sorts of blocking APIs)
>
> Sounds good. Docs... ;-)
>
> Hmm, so a compositor should set the deadline when it processes the
> wl_surface.commit, and not when it actually starts repainting, to give
> time for the driver to react and the GPU to do some more work. The
> deadline would be the time when the compositor starts its repaint, so
> it knows if the buffer is ready or not.

Technically we don't know when the commit is supposed to be shown.
Just passing a deadline of the next possible deadline however is
probably a good enough guess for this feature to be useful.

One thing that neither API allows us to do is tell the kernel in
advance when we're going to submit work and what the deadline for it
is and unfortunately that work is the most timing sensitive.

>
>
> Thanks,
> pq
>
>
> >
> > BR,
> > -R
> >
> > >
> > >
> > > Thanks,
> > > pq
> > >
> > > > ---
> > > >  drivers/dma-buf/sync_file.c | 8 ++++++++
> > > >  1 file changed, 8 insertions(+)
> > > >
> > > > diff --git a/drivers/dma-buf/sync_file.c b/drivers/dma-buf/sync_file.c
> > > > index fb6ca1032885..c30b2085ee0a 100644
> > > > --- a/drivers/dma-buf/sync_file.c
> > > > +++ b/drivers/dma-buf/sync_file.c
> > > > @@ -192,6 +192,14 @@ static __poll_t sync_file_poll(struct file *file, poll_table *wait)
> > > >  {
> > > >       struct sync_file *sync_file = file->private_data;
> > > >
> > > > +     /*
> > > > +      * The POLLPRI/EPOLLPRI flag can be used to signal that
> > > > +      * userspace wants the fence to signal ASAP, express this
> > > > +      * as an immediate deadline.
> > > > +      */
> > > > +     if (poll_requested_events(wait) & EPOLLPRI)
> > > > +             dma_fence_set_deadline(sync_file->fence, ktime_get());
> > > > +
> > > >       poll_wait(file, &sync_file->wq, wait);
> > > >
> > > >       if (list_empty(&sync_file->cb.node) &&
> > >
>

Luben Tuikov Feb. 21, 2023, 4:48 p.m. UTC | #7

On 2023-02-20 11:14, Rob Clark wrote:
> On Mon, Feb 20, 2023 at 12:53 AM Pekka Paalanen <ppaalanen@gmail.com> wrote:
>>
>> On Sat, 18 Feb 2023 13:15:49 -0800
>> Rob Clark <robdclark@gmail.com> wrote:
>>
>>> From: Rob Clark <robdclark@chromium.org>
>>>
>>> Allow userspace to use the EPOLLPRI/POLLPRI flag to indicate an urgent
>>> wait (as opposed to a "housekeeping" wait to know when to cleanup after
>>> some work has completed).  Usermode components of GPU driver stacks
>>> often poll() on fence fd's to know when it is safe to do things like
>>> free or reuse a buffer, but they can also poll() on a fence fd when
>>> waiting to read back results from the GPU.  The EPOLLPRI/POLLPRI flag
>>> lets the kernel differentiate these two cases.
>>>
>>> Signed-off-by: Rob Clark <robdclark@chromium.org>
>>
>> Hi,
>>
>> where would the UAPI documentation of this go?
>> It seems to be missing.
> 
> Good question, I am not sure.  The poll() man page has a description,
> but my usage doesn't fit that _exactly_ (but OTOH the description is a
> bit vague).
> 
>> If a Wayland compositor is polling application fences to know which
>> client buffer to use in its rendering, should the compositor poll with
>> PRI or not? If a compositor polls with PRI, then all fences from all
>> applications would always be PRI. Would that be harmful somehow or
>> would it be beneficial?
> 
> I think a compositor would rather use the deadline ioctl and then poll
> without PRI.  Otherwise you are giving an urgency signal to the fence
> signaller which might not necessarily be needed.
> 
> The places where I expect PRI to be useful is more in mesa (things
> like glFinish(), readpix, and other similar sorts of blocking APIs)
Hi,

Hmm, but then user-space could do the opposite, namely, submit work as usual--never 
using the SET_DEADLINE ioctl, and then at the end, poll using (E)POLLPRI. That seems
like a possible usage pattern, unintended--maybe, but possible. Do we want to discourage
this? Wouldn't SET_DEADLINE be enough? I mean, one can call SET_DEADLINE with the current
time, and then wouldn't that be equivalent to (E)POLLPRI?

Rob Clark Feb. 21, 2023, 5:53 p.m. UTC | #8

On Tue, Feb 21, 2023 at 8:48 AM Luben Tuikov <luben.tuikov@amd.com> wrote:
>
> On 2023-02-20 11:14, Rob Clark wrote:
> > On Mon, Feb 20, 2023 at 12:53 AM Pekka Paalanen <ppaalanen@gmail.com> wrote:
> >>
> >> On Sat, 18 Feb 2023 13:15:49 -0800
> >> Rob Clark <robdclark@gmail.com> wrote:
> >>
> >>> From: Rob Clark <robdclark@chromium.org>
> >>>
> >>> Allow userspace to use the EPOLLPRI/POLLPRI flag to indicate an urgent
> >>> wait (as opposed to a "housekeeping" wait to know when to cleanup after
> >>> some work has completed).  Usermode components of GPU driver stacks
> >>> often poll() on fence fd's to know when it is safe to do things like
> >>> free or reuse a buffer, but they can also poll() on a fence fd when
> >>> waiting to read back results from the GPU.  The EPOLLPRI/POLLPRI flag
> >>> lets the kernel differentiate these two cases.
> >>>
> >>> Signed-off-by: Rob Clark <robdclark@chromium.org>
> >>
> >> Hi,
> >>
> >> where would the UAPI documentation of this go?
> >> It seems to be missing.
> >
> > Good question, I am not sure.  The poll() man page has a description,
> > but my usage doesn't fit that _exactly_ (but OTOH the description is a
> > bit vague).
> >
> >> If a Wayland compositor is polling application fences to know which
> >> client buffer to use in its rendering, should the compositor poll with
> >> PRI or not? If a compositor polls with PRI, then all fences from all
> >> applications would always be PRI. Would that be harmful somehow or
> >> would it be beneficial?
> >
> > I think a compositor would rather use the deadline ioctl and then poll
> > without PRI.  Otherwise you are giving an urgency signal to the fence
> > signaller which might not necessarily be needed.
> >
> > The places where I expect PRI to be useful is more in mesa (things
> > like glFinish(), readpix, and other similar sorts of blocking APIs)
> Hi,
>
> Hmm, but then user-space could do the opposite, namely, submit work as usual--never
> using the SET_DEADLINE ioctl, and then at the end, poll using (E)POLLPRI. That seems
> like a possible usage pattern, unintended--maybe, but possible. Do we want to discourage
> this? Wouldn't SET_DEADLINE be enough? I mean, one can call SET_DEADLINE with the current
> time, and then wouldn't that be equivalent to (E)POLLPRI?

Yeah, (E)POLLPRI isn't strictly needed if we have SET_DEADLINE.  It is
slightly more convenient if you want an immediate deadline (single
syscall instead of two), but not strictly needed.  OTOH it piggy-backs
on existing UABI.

BR,
-R

Rob Clark Feb. 21, 2023, 5:55 p.m. UTC | #9

On Tue, Feb 21, 2023 at 8:01 AM Sebastian Wick
<sebastian.wick@redhat.com> wrote:
>
> On Tue, Feb 21, 2023 at 9:38 AM Pekka Paalanen <ppaalanen@gmail.com> wrote:
> >
> > On Mon, 20 Feb 2023 08:14:47 -0800
> > Rob Clark <robdclark@gmail.com> wrote:
> >
> > > On Mon, Feb 20, 2023 at 12:53 AM Pekka Paalanen <ppaalanen@gmail.com> wrote:
> > > >
> > > > On Sat, 18 Feb 2023 13:15:49 -0800
> > > > Rob Clark <robdclark@gmail.com> wrote:
> > > >
> > > > > From: Rob Clark <robdclark@chromium.org>
> > > > >
> > > > > Allow userspace to use the EPOLLPRI/POLLPRI flag to indicate an urgent
> > > > > wait (as opposed to a "housekeeping" wait to know when to cleanup after
> > > > > some work has completed).  Usermode components of GPU driver stacks
> > > > > often poll() on fence fd's to know when it is safe to do things like
> > > > > free or reuse a buffer, but they can also poll() on a fence fd when
> > > > > waiting to read back results from the GPU.  The EPOLLPRI/POLLPRI flag
> > > > > lets the kernel differentiate these two cases.
> > > > >
> > > > > Signed-off-by: Rob Clark <robdclark@chromium.org>
> > > >
> > > > Hi,
> > > >
> > > > where would the UAPI documentation of this go?
> > > > It seems to be missing.
> > >
> > > Good question, I am not sure.  The poll() man page has a description,
> > > but my usage doesn't fit that _exactly_ (but OTOH the description is a
> > > bit vague).
> > >
> > > > If a Wayland compositor is polling application fences to know which
> > > > client buffer to use in its rendering, should the compositor poll with
> > > > PRI or not? If a compositor polls with PRI, then all fences from all
> > > > applications would always be PRI. Would that be harmful somehow or
> > > > would it be beneficial?
> > >
> > > I think a compositor would rather use the deadline ioctl and then poll
> > > without PRI.  Otherwise you are giving an urgency signal to the fence
> > > signaller which might not necessarily be needed.
> > >
> > > The places where I expect PRI to be useful is more in mesa (things
> > > like glFinish(), readpix, and other similar sorts of blocking APIs)
> >
> > Sounds good. Docs... ;-)
> >
> > Hmm, so a compositor should set the deadline when it processes the
> > wl_surface.commit, and not when it actually starts repainting, to give
> > time for the driver to react and the GPU to do some more work. The
> > deadline would be the time when the compositor starts its repaint, so
> > it knows if the buffer is ready or not.
>
> Technically we don't know when the commit is supposed to be shown.
> Just passing a deadline of the next possible deadline however is
> probably a good enough guess for this feature to be useful.
>
> One thing that neither API allows us to do is tell the kernel in
> advance when we're going to submit work and what the deadline for it
> is and unfortunately that work is the most timing sensitive.

Presumably you are talking about the final compositing step?
Elsewhere in this series that atomic wait-for-fences step sets the
deadline hint.

BR,
-R

> >
> >
> > Thanks,
> > pq
> >
> >
> > >
> > > BR,
> > > -R
> > >
> > > >
> > > >
> > > > Thanks,
> > > > pq
> > > >
> > > > > ---
> > > > >  drivers/dma-buf/sync_file.c | 8 ++++++++
> > > > >  1 file changed, 8 insertions(+)
> > > > >
> > > > > diff --git a/drivers/dma-buf/sync_file.c b/drivers/dma-buf/sync_file.c
> > > > > index fb6ca1032885..c30b2085ee0a 100644
> > > > > --- a/drivers/dma-buf/sync_file.c
> > > > > +++ b/drivers/dma-buf/sync_file.c
> > > > > @@ -192,6 +192,14 @@ static __poll_t sync_file_poll(struct file *file, poll_table *wait)
> > > > >  {
> > > > >       struct sync_file *sync_file = file->private_data;
> > > > >
> > > > > +     /*
> > > > > +      * The POLLPRI/EPOLLPRI flag can be used to signal that
> > > > > +      * userspace wants the fence to signal ASAP, express this
> > > > > +      * as an immediate deadline.
> > > > > +      */
> > > > > +     if (poll_requested_events(wait) & EPOLLPRI)
> > > > > +             dma_fence_set_deadline(sync_file->fence, ktime_get());
> > > > > +
> > > > >       poll_wait(file, &sync_file->wq, wait);
> > > > >
> > > > >       if (list_empty(&sync_file->cb.node) &&
> > > >
> >
>

Pekka Paalanen Feb. 22, 2023, 9:49 a.m. UTC | #10

On Tue, 21 Feb 2023 09:53:56 -0800
Rob Clark <robdclark@gmail.com> wrote:

> On Tue, Feb 21, 2023 at 8:48 AM Luben Tuikov <luben.tuikov@amd.com> wrote:
> >
> > On 2023-02-20 11:14, Rob Clark wrote:  
> > > On Mon, Feb 20, 2023 at 12:53 AM Pekka Paalanen <ppaalanen@gmail.com> wrote:  
> > >>
> > >> On Sat, 18 Feb 2023 13:15:49 -0800
> > >> Rob Clark <robdclark@gmail.com> wrote:
> > >>  
> > >>> From: Rob Clark <robdclark@chromium.org>
> > >>>
> > >>> Allow userspace to use the EPOLLPRI/POLLPRI flag to indicate an urgent
> > >>> wait (as opposed to a "housekeeping" wait to know when to cleanup after
> > >>> some work has completed).  Usermode components of GPU driver stacks
> > >>> often poll() on fence fd's to know when it is safe to do things like
> > >>> free or reuse a buffer, but they can also poll() on a fence fd when
> > >>> waiting to read back results from the GPU.  The EPOLLPRI/POLLPRI flag
> > >>> lets the kernel differentiate these two cases.
> > >>>
> > >>> Signed-off-by: Rob Clark <robdclark@chromium.org>  
> > >>
> > >> Hi,
> > >>
> > >> where would the UAPI documentation of this go?
> > >> It seems to be missing.  
> > >
> > > Good question, I am not sure.  The poll() man page has a description,
> > > but my usage doesn't fit that _exactly_ (but OTOH the description is a
> > > bit vague).
> > >  
> > >> If a Wayland compositor is polling application fences to know which
> > >> client buffer to use in its rendering, should the compositor poll with
> > >> PRI or not? If a compositor polls with PRI, then all fences from all
> > >> applications would always be PRI. Would that be harmful somehow or
> > >> would it be beneficial?  
> > >
> > > I think a compositor would rather use the deadline ioctl and then poll
> > > without PRI.  Otherwise you are giving an urgency signal to the fence
> > > signaller which might not necessarily be needed.
> > >
> > > The places where I expect PRI to be useful is more in mesa (things
> > > like glFinish(), readpix, and other similar sorts of blocking APIs)  
> > Hi,
> >
> > Hmm, but then user-space could do the opposite, namely, submit work as usual--never
> > using the SET_DEADLINE ioctl, and then at the end, poll using (E)POLLPRI. That seems
> > like a possible usage pattern, unintended--maybe, but possible. Do we want to discourage
> > this? Wouldn't SET_DEADLINE be enough? I mean, one can call SET_DEADLINE with the current
> > time, and then wouldn't that be equivalent to (E)POLLPRI?  
> 
> Yeah, (E)POLLPRI isn't strictly needed if we have SET_DEADLINE.  It is
> slightly more convenient if you want an immediate deadline (single
> syscall instead of two), but not strictly needed.  OTOH it piggy-backs
> on existing UABI.

In that case, I would be conservative, and not add the POLLPRI
semantics. An UAPI addition that is not strictly needed and somewhat
unclear if it violates any design principles is best not done, until it
is proven to be beneficial.

Besides, a Wayland compositor does not necessary need to add the fd
to its main event loop for poll. It could just SET_DEADLINE, and then
when it renders simply check if the fence passed or not already. Not
polling means the compositor does not need to wake up at the moment the
fence signals to just record a flag.

On another matter, if the application uses SET_DEADLINE with one
timestamp, and the compositor uses SET_DEADLINE on the same thing with
another timestamp, what should happen?

Maybe it's a soft-realtime app whose primary goal is not display, and
it needs the result faster than the window server?

Maybe SET_DEADLINE should set the deadline only to an earlier timestamp
and never later?


Thanks,
pq

Luben Tuikov Feb. 22, 2023, 10:26 a.m. UTC | #11

On 2023-02-22 04:49, Pekka Paalanen wrote:
> On Tue, 21 Feb 2023 09:53:56 -0800
> Rob Clark <robdclark@gmail.com> wrote:
> 
>> On Tue, Feb 21, 2023 at 8:48 AM Luben Tuikov <luben.tuikov@amd.com> wrote:
>>>
>>> On 2023-02-20 11:14, Rob Clark wrote:  
>>>> On Mon, Feb 20, 2023 at 12:53 AM Pekka Paalanen <ppaalanen@gmail.com> wrote:  
>>>>>
>>>>> On Sat, 18 Feb 2023 13:15:49 -0800
>>>>> Rob Clark <robdclark@gmail.com> wrote:
>>>>>  
>>>>>> From: Rob Clark <robdclark@chromium.org>
>>>>>>
>>>>>> Allow userspace to use the EPOLLPRI/POLLPRI flag to indicate an urgent
>>>>>> wait (as opposed to a "housekeeping" wait to know when to cleanup after
>>>>>> some work has completed).  Usermode components of GPU driver stacks
>>>>>> often poll() on fence fd's to know when it is safe to do things like
>>>>>> free or reuse a buffer, but they can also poll() on a fence fd when
>>>>>> waiting to read back results from the GPU.  The EPOLLPRI/POLLPRI flag
>>>>>> lets the kernel differentiate these two cases.
>>>>>>
>>>>>> Signed-off-by: Rob Clark <robdclark@chromium.org>  
>>>>>
>>>>> Hi,
>>>>>
>>>>> where would the UAPI documentation of this go?
>>>>> It seems to be missing.  
>>>>
>>>> Good question, I am not sure.  The poll() man page has a description,
>>>> but my usage doesn't fit that _exactly_ (but OTOH the description is a
>>>> bit vague).
>>>>  
>>>>> If a Wayland compositor is polling application fences to know which
>>>>> client buffer to use in its rendering, should the compositor poll with
>>>>> PRI or not? If a compositor polls with PRI, then all fences from all
>>>>> applications would always be PRI. Would that be harmful somehow or
>>>>> would it be beneficial?  
>>>>
>>>> I think a compositor would rather use the deadline ioctl and then poll
>>>> without PRI.  Otherwise you are giving an urgency signal to the fence
>>>> signaller which might not necessarily be needed.
>>>>
>>>> The places where I expect PRI to be useful is more in mesa (things
>>>> like glFinish(), readpix, and other similar sorts of blocking APIs)  
>>> Hi,
>>>
>>> Hmm, but then user-space could do the opposite, namely, submit work as usual--never
>>> using the SET_DEADLINE ioctl, and then at the end, poll using (E)POLLPRI. That seems
>>> like a possible usage pattern, unintended--maybe, but possible. Do we want to discourage
>>> this? Wouldn't SET_DEADLINE be enough? I mean, one can call SET_DEADLINE with the current
>>> time, and then wouldn't that be equivalent to (E)POLLPRI?  
>>
>> Yeah, (E)POLLPRI isn't strictly needed if we have SET_DEADLINE.  It is
>> slightly more convenient if you want an immediate deadline (single
>> syscall instead of two), but not strictly needed.  OTOH it piggy-backs
>> on existing UABI.
> 
> In that case, I would be conservative, and not add the POLLPRI
> semantics. An UAPI addition that is not strictly needed and somewhat
> unclear if it violates any design principles is best not done, until it
> is proven to be beneficial.

That is my sentiment as well. Moreover, on hard-realtime systems,
one would want to set the deadline at the outset and not at poll time.

Rob Clark Feb. 22, 2023, 3:37 p.m. UTC | #12

On Wed, Feb 22, 2023 at 1:49 AM Pekka Paalanen <ppaalanen@gmail.com> wrote:
>
> On Tue, 21 Feb 2023 09:53:56 -0800
> Rob Clark <robdclark@gmail.com> wrote:
>
> > On Tue, Feb 21, 2023 at 8:48 AM Luben Tuikov <luben.tuikov@amd.com> wrote:
> > >
> > > On 2023-02-20 11:14, Rob Clark wrote:
> > > > On Mon, Feb 20, 2023 at 12:53 AM Pekka Paalanen <ppaalanen@gmail.com> wrote:
> > > >>
> > > >> On Sat, 18 Feb 2023 13:15:49 -0800
> > > >> Rob Clark <robdclark@gmail.com> wrote:
> > > >>
> > > >>> From: Rob Clark <robdclark@chromium.org>
> > > >>>
> > > >>> Allow userspace to use the EPOLLPRI/POLLPRI flag to indicate an urgent
> > > >>> wait (as opposed to a "housekeeping" wait to know when to cleanup after
> > > >>> some work has completed).  Usermode components of GPU driver stacks
> > > >>> often poll() on fence fd's to know when it is safe to do things like
> > > >>> free or reuse a buffer, but they can also poll() on a fence fd when
> > > >>> waiting to read back results from the GPU.  The EPOLLPRI/POLLPRI flag
> > > >>> lets the kernel differentiate these two cases.
> > > >>>
> > > >>> Signed-off-by: Rob Clark <robdclark@chromium.org>
> > > >>
> > > >> Hi,
> > > >>
> > > >> where would the UAPI documentation of this go?
> > > >> It seems to be missing.
> > > >
> > > > Good question, I am not sure.  The poll() man page has a description,
> > > > but my usage doesn't fit that _exactly_ (but OTOH the description is a
> > > > bit vague).
> > > >
> > > >> If a Wayland compositor is polling application fences to know which
> > > >> client buffer to use in its rendering, should the compositor poll with
> > > >> PRI or not? If a compositor polls with PRI, then all fences from all
> > > >> applications would always be PRI. Would that be harmful somehow or
> > > >> would it be beneficial?
> > > >
> > > > I think a compositor would rather use the deadline ioctl and then poll
> > > > without PRI.  Otherwise you are giving an urgency signal to the fence
> > > > signaller which might not necessarily be needed.
> > > >
> > > > The places where I expect PRI to be useful is more in mesa (things
> > > > like glFinish(), readpix, and other similar sorts of blocking APIs)
> > > Hi,
> > >
> > > Hmm, but then user-space could do the opposite, namely, submit work as usual--never
> > > using the SET_DEADLINE ioctl, and then at the end, poll using (E)POLLPRI. That seems
> > > like a possible usage pattern, unintended--maybe, but possible. Do we want to discourage
> > > this? Wouldn't SET_DEADLINE be enough? I mean, one can call SET_DEADLINE with the current
> > > time, and then wouldn't that be equivalent to (E)POLLPRI?
> >
> > Yeah, (E)POLLPRI isn't strictly needed if we have SET_DEADLINE.  It is
> > slightly more convenient if you want an immediate deadline (single
> > syscall instead of two), but not strictly needed.  OTOH it piggy-backs
> > on existing UABI.
>
> In that case, I would be conservative, and not add the POLLPRI
> semantics. An UAPI addition that is not strictly needed and somewhat
> unclear if it violates any design principles is best not done, until it
> is proven to be beneficial.
>
> Besides, a Wayland compositor does not necessary need to add the fd
> to its main event loop for poll. It could just SET_DEADLINE, and then
> when it renders simply check if the fence passed or not already. Not
> polling means the compositor does not need to wake up at the moment the
> fence signals to just record a flag.

poll(POLLPRI) isn't intended for wayland.. but is a thing I want in
mesa for fence waits.  I _could_ use SET_DEADLINE but it is two
syscalls and correspondingly more code ;-)

> On another matter, if the application uses SET_DEADLINE with one
> timestamp, and the compositor uses SET_DEADLINE on the same thing with
> another timestamp, what should happen?

The expectation is that many deadline hints can be set on a fence.
The fence signaller should track the soonest deadline.

BR,
-R

> Maybe it's a soft-realtime app whose primary goal is not display, and
> it needs the result faster than the window server?
>
> Maybe SET_DEADLINE should set the deadline only to an earlier timestamp
> and never later?
>
>
> Thanks,
> pq

Pekka Paalanen Feb. 23, 2023, 9:38 a.m. UTC | #13

On Wed, 22 Feb 2023 07:37:26 -0800
Rob Clark <robdclark@gmail.com> wrote:

> On Wed, Feb 22, 2023 at 1:49 AM Pekka Paalanen <ppaalanen@gmail.com> wrote:
> >
> > On Tue, 21 Feb 2023 09:53:56 -0800
> > Rob Clark <robdclark@gmail.com> wrote:
> >  
> > > On Tue, Feb 21, 2023 at 8:48 AM Luben Tuikov <luben.tuikov@amd.com> wrote:  
> > > >
> > > > On 2023-02-20 11:14, Rob Clark wrote:  
> > > > > On Mon, Feb 20, 2023 at 12:53 AM Pekka Paalanen <ppaalanen@gmail.com> wrote:  
> > > > >>
> > > > >> On Sat, 18 Feb 2023 13:15:49 -0800
> > > > >> Rob Clark <robdclark@gmail.com> wrote:
> > > > >>  
> > > > >>> From: Rob Clark <robdclark@chromium.org>
> > > > >>>
> > > > >>> Allow userspace to use the EPOLLPRI/POLLPRI flag to indicate an urgent
> > > > >>> wait (as opposed to a "housekeeping" wait to know when to cleanup after
> > > > >>> some work has completed).  Usermode components of GPU driver stacks
> > > > >>> often poll() on fence fd's to know when it is safe to do things like
> > > > >>> free or reuse a buffer, but they can also poll() on a fence fd when
> > > > >>> waiting to read back results from the GPU.  The EPOLLPRI/POLLPRI flag
> > > > >>> lets the kernel differentiate these two cases.
> > > > >>>
> > > > >>> Signed-off-by: Rob Clark <robdclark@chromium.org>  
> > > > >>
> > > > >> Hi,
> > > > >>
> > > > >> where would the UAPI documentation of this go?
> > > > >> It seems to be missing.  
> > > > >
> > > > > Good question, I am not sure.  The poll() man page has a description,
> > > > > but my usage doesn't fit that _exactly_ (but OTOH the description is a
> > > > > bit vague).
> > > > >  
> > > > >> If a Wayland compositor is polling application fences to know which
> > > > >> client buffer to use in its rendering, should the compositor poll with
> > > > >> PRI or not? If a compositor polls with PRI, then all fences from all
> > > > >> applications would always be PRI. Would that be harmful somehow or
> > > > >> would it be beneficial?  
> > > > >
> > > > > I think a compositor would rather use the deadline ioctl and then poll
> > > > > without PRI.  Otherwise you are giving an urgency signal to the fence
> > > > > signaller which might not necessarily be needed.
> > > > >
> > > > > The places where I expect PRI to be useful is more in mesa (things
> > > > > like glFinish(), readpix, and other similar sorts of blocking APIs)  
> > > > Hi,
> > > >
> > > > Hmm, but then user-space could do the opposite, namely, submit work as usual--never
> > > > using the SET_DEADLINE ioctl, and then at the end, poll using (E)POLLPRI. That seems
> > > > like a possible usage pattern, unintended--maybe, but possible. Do we want to discourage
> > > > this? Wouldn't SET_DEADLINE be enough? I mean, one can call SET_DEADLINE with the current
> > > > time, and then wouldn't that be equivalent to (E)POLLPRI?  
> > >
> > > Yeah, (E)POLLPRI isn't strictly needed if we have SET_DEADLINE.  It is
> > > slightly more convenient if you want an immediate deadline (single
> > > syscall instead of two), but not strictly needed.  OTOH it piggy-backs
> > > on existing UABI.  
> >
> > In that case, I would be conservative, and not add the POLLPRI
> > semantics. An UAPI addition that is not strictly needed and somewhat
> > unclear if it violates any design principles is best not done, until it
> > is proven to be beneficial.
> >
> > Besides, a Wayland compositor does not necessary need to add the fd
> > to its main event loop for poll. It could just SET_DEADLINE, and then
> > when it renders simply check if the fence passed or not already. Not
> > polling means the compositor does not need to wake up at the moment the
> > fence signals to just record a flag.  
> 
> poll(POLLPRI) isn't intended for wayland.. but is a thing I want in
> mesa for fence waits.  I _could_ use SET_DEADLINE but it is two
> syscalls and correspondingly more code ;-)

But is it actually beneficial? "More code" seems quite irrelevant.

Would there be a hundred or more of those per frame? Or would it be
always limited to one or two? Or totally depend on what the application
is doing? Is it a significant impact?

> > On another matter, if the application uses SET_DEADLINE with one
> > timestamp, and the compositor uses SET_DEADLINE on the same thing with
> > another timestamp, what should happen?  
> 
> The expectation is that many deadline hints can be set on a fence.
> The fence signaller should track the soonest deadline.

You need to document that as UAPI, since it is observable to userspace.
It would be bad if drivers or subsystems would differ in behaviour.


Thanks,
pq

Rob Clark Feb. 23, 2023, 6:51 p.m. UTC | #14

On Thu, Feb 23, 2023 at 1:38 AM Pekka Paalanen <ppaalanen@gmail.com> wrote:
>
> On Wed, 22 Feb 2023 07:37:26 -0800
> Rob Clark <robdclark@gmail.com> wrote:
>
> > On Wed, Feb 22, 2023 at 1:49 AM Pekka Paalanen <ppaalanen@gmail.com> wrote:
> > >
> > > On Tue, 21 Feb 2023 09:53:56 -0800
> > > Rob Clark <robdclark@gmail.com> wrote:
> > >
> > > > On Tue, Feb 21, 2023 at 8:48 AM Luben Tuikov <luben.tuikov@amd.com> wrote:
> > > > >
> > > > > On 2023-02-20 11:14, Rob Clark wrote:
> > > > > > On Mon, Feb 20, 2023 at 12:53 AM Pekka Paalanen <ppaalanen@gmail.com> wrote:
> > > > > >>
> > > > > >> On Sat, 18 Feb 2023 13:15:49 -0800
> > > > > >> Rob Clark <robdclark@gmail.com> wrote:
> > > > > >>
> > > > > >>> From: Rob Clark <robdclark@chromium.org>
> > > > > >>>
> > > > > >>> Allow userspace to use the EPOLLPRI/POLLPRI flag to indicate an urgent
> > > > > >>> wait (as opposed to a "housekeeping" wait to know when to cleanup after
> > > > > >>> some work has completed).  Usermode components of GPU driver stacks
> > > > > >>> often poll() on fence fd's to know when it is safe to do things like
> > > > > >>> free or reuse a buffer, but they can also poll() on a fence fd when
> > > > > >>> waiting to read back results from the GPU.  The EPOLLPRI/POLLPRI flag
> > > > > >>> lets the kernel differentiate these two cases.
> > > > > >>>
> > > > > >>> Signed-off-by: Rob Clark <robdclark@chromium.org>
> > > > > >>
> > > > > >> Hi,
> > > > > >>
> > > > > >> where would the UAPI documentation of this go?
> > > > > >> It seems to be missing.
> > > > > >
> > > > > > Good question, I am not sure.  The poll() man page has a description,
> > > > > > but my usage doesn't fit that _exactly_ (but OTOH the description is a
> > > > > > bit vague).
> > > > > >
> > > > > >> If a Wayland compositor is polling application fences to know which
> > > > > >> client buffer to use in its rendering, should the compositor poll with
> > > > > >> PRI or not? If a compositor polls with PRI, then all fences from all
> > > > > >> applications would always be PRI. Would that be harmful somehow or
> > > > > >> would it be beneficial?
> > > > > >
> > > > > > I think a compositor would rather use the deadline ioctl and then poll
> > > > > > without PRI.  Otherwise you are giving an urgency signal to the fence
> > > > > > signaller which might not necessarily be needed.
> > > > > >
> > > > > > The places where I expect PRI to be useful is more in mesa (things
> > > > > > like glFinish(), readpix, and other similar sorts of blocking APIs)
> > > > > Hi,
> > > > >
> > > > > Hmm, but then user-space could do the opposite, namely, submit work as usual--never
> > > > > using the SET_DEADLINE ioctl, and then at the end, poll using (E)POLLPRI. That seems
> > > > > like a possible usage pattern, unintended--maybe, but possible. Do we want to discourage
> > > > > this? Wouldn't SET_DEADLINE be enough? I mean, one can call SET_DEADLINE with the current
> > > > > time, and then wouldn't that be equivalent to (E)POLLPRI?
> > > >
> > > > Yeah, (E)POLLPRI isn't strictly needed if we have SET_DEADLINE.  It is
> > > > slightly more convenient if you want an immediate deadline (single
> > > > syscall instead of two), but not strictly needed.  OTOH it piggy-backs
> > > > on existing UABI.
> > >
> > > In that case, I would be conservative, and not add the POLLPRI
> > > semantics. An UAPI addition that is not strictly needed and somewhat
> > > unclear if it violates any design principles is best not done, until it
> > > is proven to be beneficial.
> > >
> > > Besides, a Wayland compositor does not necessary need to add the fd
> > > to its main event loop for poll. It could just SET_DEADLINE, and then
> > > when it renders simply check if the fence passed or not already. Not
> > > polling means the compositor does not need to wake up at the moment the
> > > fence signals to just record a flag.
> >
> > poll(POLLPRI) isn't intended for wayland.. but is a thing I want in
> > mesa for fence waits.  I _could_ use SET_DEADLINE but it is two
> > syscalls and correspondingly more code ;-)
>
> But is it actually beneficial? "More code" seems quite irrelevant.
>
> Would there be a hundred or more of those per frame? Or would it be
> always limited to one or two? Or totally depend on what the application
> is doing? Is it a significant impact?

In general, any time the CPU is waiting on the GPU, you have already
lost.  So I don't think the extra syscall is too much of a problem.
Just less convenient.

> > > On another matter, if the application uses SET_DEADLINE with one
> > > timestamp, and the compositor uses SET_DEADLINE on the same thing with
> > > another timestamp, what should happen?
> >
> > The expectation is that many deadline hints can be set on a fence.
> > The fence signaller should track the soonest deadline.
>
> You need to document that as UAPI, since it is observable to userspace.
> It would be bad if drivers or subsystems would differ in behaviour.
>

It is in the end a hint.  It is about giving the driver more
information so that it can make better choices.  But the driver is
even free to ignore it.  So maybe "expectation" is too strong of a
word.  Rather, any other behavior doesn't really make sense.  But it
could end up being dictated by how the hw and/or fw works.

BR,
-R

>
> Thanks,
> pq

Pekka Paalanen Feb. 24, 2023, 9:26 a.m. UTC | #15

On Thu, 23 Feb 2023 10:51:48 -0800
Rob Clark <robdclark@gmail.com> wrote:

> On Thu, Feb 23, 2023 at 1:38 AM Pekka Paalanen <ppaalanen@gmail.com> wrote:
> >
> > On Wed, 22 Feb 2023 07:37:26 -0800
> > Rob Clark <robdclark@gmail.com> wrote:
> >  
> > > On Wed, Feb 22, 2023 at 1:49 AM Pekka Paalanen <ppaalanen@gmail.com> wrote:  

...

> > > > On another matter, if the application uses SET_DEADLINE with one
> > > > timestamp, and the compositor uses SET_DEADLINE on the same thing with
> > > > another timestamp, what should happen?  
> > >
> > > The expectation is that many deadline hints can be set on a fence.
> > > The fence signaller should track the soonest deadline.  
> >
> > You need to document that as UAPI, since it is observable to userspace.
> > It would be bad if drivers or subsystems would differ in behaviour.
> >  
> 
> It is in the end a hint.  It is about giving the driver more
> information so that it can make better choices.  But the driver is
> even free to ignore it.  So maybe "expectation" is too strong of a
> word.  Rather, any other behavior doesn't really make sense.  But it
> could end up being dictated by how the hw and/or fw works.

It will stop being a hint once it has been implemented and used in the
wild long enough. The kernel userspace regression rules make sure of
that.

See the topic of implementing triple-buffering in Mutter in order to
put more work to the GPU in order to have the GPU ramp up clocks in
order to not miss rendering deadlines. I don't think that patch set has
landed in Mutter upstream, but I hear distributions in downstream are
already carrying it.

https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/1383
https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/1441

Granted, GPU clocks are just one side of that story it seems, and
triple-buffering may have other benefits.

If SET_DEADLINE would fix that problem without triple-buffering, it is
definitely userspace observable, expected and eventually required
behaviour.

Thanks,
pq

Tvrtko Ursulin Feb. 24, 2023, 9:41 a.m. UTC | #16

On 24/02/2023 09:26, Pekka Paalanen wrote:
> On Thu, 23 Feb 2023 10:51:48 -0800
> Rob Clark <robdclark@gmail.com> wrote:
> 
>> On Thu, Feb 23, 2023 at 1:38 AM Pekka Paalanen <ppaalanen@gmail.com> wrote:
>>>
>>> On Wed, 22 Feb 2023 07:37:26 -0800
>>> Rob Clark <robdclark@gmail.com> wrote:
>>>   
>>>> On Wed, Feb 22, 2023 at 1:49 AM Pekka Paalanen <ppaalanen@gmail.com> wrote:
> 
> ...
> 
>>>>> On another matter, if the application uses SET_DEADLINE with one
>>>>> timestamp, and the compositor uses SET_DEADLINE on the same thing with
>>>>> another timestamp, what should happen?
>>>>
>>>> The expectation is that many deadline hints can be set on a fence.
>>>> The fence signaller should track the soonest deadline.
>>>
>>> You need to document that as UAPI, since it is observable to userspace.
>>> It would be bad if drivers or subsystems would differ in behaviour.
>>>   
>>
>> It is in the end a hint.  It is about giving the driver more
>> information so that it can make better choices.  But the driver is
>> even free to ignore it.  So maybe "expectation" is too strong of a
>> word.  Rather, any other behavior doesn't really make sense.  But it
>> could end up being dictated by how the hw and/or fw works.
> 
> It will stop being a hint once it has been implemented and used in the
> wild long enough. The kernel userspace regression rules make sure of
> that.

Yeah, tricky and maybe a gray area in this case. I think we eluded 
elsewhere in the thread that renaming the thing might be an option.

So maybe instead of deadline, which is a very strong word, use something 
along the lines of "present time hint", or "signalled time hint"? Maybe 
reads clumsy. Just throwing some ideas for a start.

Regards,

Tvrtko

> See the topic of implementing triple-buffering in Mutter in order to
> put more work to the GPU in order to have the GPU ramp up clocks in
> order to not miss rendering deadlines. I don't think that patch set has
> landed in Mutter upstream, but I hear distributions in downstream are
> already carrying it.
> 
> https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/1383
> https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/1441
> 
> Granted, GPU clocks are just one side of that story it seems, and
> triple-buffering may have other benefits.
> 
> If SET_DEADLINE would fix that problem without triple-buffering, it is
> definitely userspace observable, expected and eventually required
> behaviour.
> 
> 
> Thanks,
> pq

Pekka Paalanen Feb. 24, 2023, 10:24 a.m. UTC | #17

On Fri, 24 Feb 2023 09:41:46 +0000
Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> wrote:

> On 24/02/2023 09:26, Pekka Paalanen wrote:
> > On Thu, 23 Feb 2023 10:51:48 -0800
> > Rob Clark <robdclark@gmail.com> wrote:
> >   
> >> On Thu, Feb 23, 2023 at 1:38 AM Pekka Paalanen <ppaalanen@gmail.com> wrote:  
> >>>
> >>> On Wed, 22 Feb 2023 07:37:26 -0800
> >>> Rob Clark <robdclark@gmail.com> wrote:
> >>>     
> >>>> On Wed, Feb 22, 2023 at 1:49 AM Pekka Paalanen <ppaalanen@gmail.com> wrote:  
> > 
> > ...
> >   
> >>>>> On another matter, if the application uses SET_DEADLINE with one
> >>>>> timestamp, and the compositor uses SET_DEADLINE on the same thing with
> >>>>> another timestamp, what should happen?  
> >>>>
> >>>> The expectation is that many deadline hints can be set on a fence.
> >>>> The fence signaller should track the soonest deadline.  
> >>>
> >>> You need to document that as UAPI, since it is observable to userspace.
> >>> It would be bad if drivers or subsystems would differ in behaviour.
> >>>     
> >>
> >> It is in the end a hint.  It is about giving the driver more
> >> information so that it can make better choices.  But the driver is
> >> even free to ignore it.  So maybe "expectation" is too strong of a
> >> word.  Rather, any other behavior doesn't really make sense.  But it
> >> could end up being dictated by how the hw and/or fw works.  
> > 
> > It will stop being a hint once it has been implemented and used in the
> > wild long enough. The kernel userspace regression rules make sure of
> > that.  
> 
> Yeah, tricky and maybe a gray area in this case. I think we eluded 
> elsewhere in the thread that renaming the thing might be an option.
> 
> So maybe instead of deadline, which is a very strong word, use something 
> along the lines of "present time hint", or "signalled time hint"? Maybe 
> reads clumsy. Just throwing some ideas for a start.

You can try, but I fear that if it ever changes behaviour and
someone notices that, it's labelled as a kernel regression. I don't
think documentation has ever been the authoritative definition of UABI
in Linux, it just guides drivers and userspace towards a common
understanding and common usage patterns.

So even if the UABI contract is not documented (ugh), you need to be
prepared to set the UABI contract through kernel implementation.

If you do not document the UABI contract, then different drivers are
likely to implement it differently, leading to differing behaviour.
Also userspace will invent wild ways to abuse the UABI if there is no
documentation guiding it on proper use. If userspace or end users
observe different behaviour, that's bad even if it's not a regression.

I don't like the situation either, but it is what it is. UABI stability
trumps everything regardless of whether it was documented or not.

I bet userspace is going to use this as a "make it faster, make it
hotter" button. I would not be surprised if someone wrote a LD_PRELOAD
library that stamps any and all fences with an expired deadline to
just squeeze out a little more through some weird side-effect.

Well, that's hopefully overboard in scaring, but in the end, I would
like to see UABI documented so I can have a feeling of what it is for
and how it was intended to be used. That's all.

Thanks,
pq

Tvrtko Ursulin Feb. 24, 2023, 10:50 a.m. UTC | #18

On 24/02/2023 10:24, Pekka Paalanen wrote:
> On Fri, 24 Feb 2023 09:41:46 +0000
> Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> wrote:
> 
>> On 24/02/2023 09:26, Pekka Paalanen wrote:
>>> On Thu, 23 Feb 2023 10:51:48 -0800
>>> Rob Clark <robdclark@gmail.com> wrote:
>>>    
>>>> On Thu, Feb 23, 2023 at 1:38 AM Pekka Paalanen <ppaalanen@gmail.com> wrote:
>>>>>
>>>>> On Wed, 22 Feb 2023 07:37:26 -0800
>>>>> Rob Clark <robdclark@gmail.com> wrote:
>>>>>      
>>>>>> On Wed, Feb 22, 2023 at 1:49 AM Pekka Paalanen <ppaalanen@gmail.com> wrote:
>>>
>>> ...
>>>    
>>>>>>> On another matter, if the application uses SET_DEADLINE with one
>>>>>>> timestamp, and the compositor uses SET_DEADLINE on the same thing with
>>>>>>> another timestamp, what should happen?
>>>>>>
>>>>>> The expectation is that many deadline hints can be set on a fence.
>>>>>> The fence signaller should track the soonest deadline.
>>>>>
>>>>> You need to document that as UAPI, since it is observable to userspace.
>>>>> It would be bad if drivers or subsystems would differ in behaviour.
>>>>>      
>>>>
>>>> It is in the end a hint.  It is about giving the driver more
>>>> information so that it can make better choices.  But the driver is
>>>> even free to ignore it.  So maybe "expectation" is too strong of a
>>>> word.  Rather, any other behavior doesn't really make sense.  But it
>>>> could end up being dictated by how the hw and/or fw works.
>>>
>>> It will stop being a hint once it has been implemented and used in the
>>> wild long enough. The kernel userspace regression rules make sure of
>>> that.
>>
>> Yeah, tricky and maybe a gray area in this case. I think we eluded
>> elsewhere in the thread that renaming the thing might be an option.
>>
>> So maybe instead of deadline, which is a very strong word, use something
>> along the lines of "present time hint", or "signalled time hint"? Maybe
>> reads clumsy. Just throwing some ideas for a start.
> 
> You can try, but I fear that if it ever changes behaviour and
> someone notices that, it's labelled as a kernel regression. I don't
> think documentation has ever been the authoritative definition of UABI
> in Linux, it just guides drivers and userspace towards a common
> understanding and common usage patterns.
> 
> So even if the UABI contract is not documented (ugh), you need to be
> prepared to set the UABI contract through kernel implementation.

To be the devil's advocate it probably wouldn't be an ABI regression but 
just an regression. Same way as what nice(2) priorities mean hasn't 
always been the same over the years, I don't think there is a strict 
contract.

Having said that, it may be different with latency sensitive stuff such 
as UIs though since it is very observable and can be very painful to users.

> If you do not document the UABI contract, then different drivers are
> likely to implement it differently, leading to differing behaviour.
> Also userspace will invent wild ways to abuse the UABI if there is no
> documentation guiding it on proper use. If userspace or end users
> observe different behaviour, that's bad even if it's not a regression.
> 
> I don't like the situation either, but it is what it is. UABI stability
> trumps everything regardless of whether it was documented or not.
> 
> I bet userspace is going to use this as a "make it faster, make it
> hotter" button. I would not be surprised if someone wrote a LD_PRELOAD
> library that stamps any and all fences with an expired deadline to
> just squeeze out a little more through some weird side-effect.
> 
> Well, that's hopefully overboard in scaring, but in the end, I would
> like to see UABI documented so I can have a feeling of what it is for
> and how it was intended to be used. That's all.

We share the same concern. If you read elsewhere in these threads you 
will notice I have been calling this an "arms race". If the ability to 
make yourself go faster does not required additional privilege I also 
worry everyone will do it at which point it becomes pointless. So yes, I 
do share this concern about exposing any of this as an unprivileged uapi.

Is it possible to limit access to only compositors in some sane way? 
Sounds tricky when dma-fence should be disconnected from DRM..

Regards,

Tvrtko

Pekka Paalanen Feb. 24, 2023, 11 a.m. UTC | #19

On Fri, 24 Feb 2023 10:50:51 +0000
Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> wrote:

> On 24/02/2023 10:24, Pekka Paalanen wrote:
> > On Fri, 24 Feb 2023 09:41:46 +0000
> > Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> wrote:
> >   
> >> On 24/02/2023 09:26, Pekka Paalanen wrote:  
> >>> On Thu, 23 Feb 2023 10:51:48 -0800
> >>> Rob Clark <robdclark@gmail.com> wrote:
> >>>      
> >>>> On Thu, Feb 23, 2023 at 1:38 AM Pekka Paalanen <ppaalanen@gmail.com> wrote:  
> >>>>>
> >>>>> On Wed, 22 Feb 2023 07:37:26 -0800
> >>>>> Rob Clark <robdclark@gmail.com> wrote:
> >>>>>        
> >>>>>> On Wed, Feb 22, 2023 at 1:49 AM Pekka Paalanen <ppaalanen@gmail.com> wrote:  
> >>>
> >>> ...
> >>>      
> >>>>>>> On another matter, if the application uses SET_DEADLINE with one
> >>>>>>> timestamp, and the compositor uses SET_DEADLINE on the same thing with
> >>>>>>> another timestamp, what should happen?  
> >>>>>>
> >>>>>> The expectation is that many deadline hints can be set on a fence.
> >>>>>> The fence signaller should track the soonest deadline.  
> >>>>>
> >>>>> You need to document that as UAPI, since it is observable to userspace.
> >>>>> It would be bad if drivers or subsystems would differ in behaviour.
> >>>>>        
> >>>>
> >>>> It is in the end a hint.  It is about giving the driver more
> >>>> information so that it can make better choices.  But the driver is
> >>>> even free to ignore it.  So maybe "expectation" is too strong of a
> >>>> word.  Rather, any other behavior doesn't really make sense.  But it
> >>>> could end up being dictated by how the hw and/or fw works.  
> >>>
> >>> It will stop being a hint once it has been implemented and used in the
> >>> wild long enough. The kernel userspace regression rules make sure of
> >>> that.  
> >>
> >> Yeah, tricky and maybe a gray area in this case. I think we eluded
> >> elsewhere in the thread that renaming the thing might be an option.
> >>
> >> So maybe instead of deadline, which is a very strong word, use something
> >> along the lines of "present time hint", or "signalled time hint"? Maybe
> >> reads clumsy. Just throwing some ideas for a start.  
> > 
> > You can try, but I fear that if it ever changes behaviour and
> > someone notices that, it's labelled as a kernel regression. I don't
> > think documentation has ever been the authoritative definition of UABI
> > in Linux, it just guides drivers and userspace towards a common
> > understanding and common usage patterns.
> > 
> > So even if the UABI contract is not documented (ugh), you need to be
> > prepared to set the UABI contract through kernel implementation.  
> 
> To be the devil's advocate it probably wouldn't be an ABI regression but 
> just an regression. Same way as what nice(2) priorities mean hasn't 
> always been the same over the years, I don't think there is a strict 
> contract.
> 
> Having said that, it may be different with latency sensitive stuff such 
> as UIs though since it is very observable and can be very painful to users.
> 
> > If you do not document the UABI contract, then different drivers are
> > likely to implement it differently, leading to differing behaviour.
> > Also userspace will invent wild ways to abuse the UABI if there is no
> > documentation guiding it on proper use. If userspace or end users
> > observe different behaviour, that's bad even if it's not a regression.
> > 
> > I don't like the situation either, but it is what it is. UABI stability
> > trumps everything regardless of whether it was documented or not.
> > 
> > I bet userspace is going to use this as a "make it faster, make it
> > hotter" button. I would not be surprised if someone wrote a LD_PRELOAD
> > library that stamps any and all fences with an expired deadline to
> > just squeeze out a little more through some weird side-effect.
> > 
> > Well, that's hopefully overboard in scaring, but in the end, I would
> > like to see UABI documented so I can have a feeling of what it is for
> > and how it was intended to be used. That's all.  
> 
> We share the same concern. If you read elsewhere in these threads you 
> will notice I have been calling this an "arms race". If the ability to 
> make yourself go faster does not required additional privilege I also 
> worry everyone will do it at which point it becomes pointless. So yes, I 
> do share this concern about exposing any of this as an unprivileged uapi.
> 
> Is it possible to limit access to only compositors in some sane way? 
> Sounds tricky when dma-fence should be disconnected from DRM..

Maybe it's not that bad in this particular case, because we are talking
only about boosting GPU clocks which benefits everyone (except
battery life) and it does not penalize other programs like e.g.
job priorities do.

Drivers are not going to use the deadline for scheduling priorities,
right? I don't recall seeing any mention of that.

...right?


Thanks,
pq

Tvrtko Ursulin Feb. 24, 2023, 11:37 a.m. UTC | #20

On 24/02/2023 11:00, Pekka Paalanen wrote:
> On Fri, 24 Feb 2023 10:50:51 +0000
> Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> wrote:
> 
>> On 24/02/2023 10:24, Pekka Paalanen wrote:
>>> On Fri, 24 Feb 2023 09:41:46 +0000
>>> Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> wrote:
>>>    
>>>> On 24/02/2023 09:26, Pekka Paalanen wrote:
>>>>> On Thu, 23 Feb 2023 10:51:48 -0800
>>>>> Rob Clark <robdclark@gmail.com> wrote:
>>>>>       
>>>>>> On Thu, Feb 23, 2023 at 1:38 AM Pekka Paalanen <ppaalanen@gmail.com> wrote:
>>>>>>>
>>>>>>> On Wed, 22 Feb 2023 07:37:26 -0800
>>>>>>> Rob Clark <robdclark@gmail.com> wrote:
>>>>>>>         
>>>>>>>> On Wed, Feb 22, 2023 at 1:49 AM Pekka Paalanen <ppaalanen@gmail.com> wrote:
>>>>>
>>>>> ...
>>>>>       
>>>>>>>>> On another matter, if the application uses SET_DEADLINE with one
>>>>>>>>> timestamp, and the compositor uses SET_DEADLINE on the same thing with
>>>>>>>>> another timestamp, what should happen?
>>>>>>>>
>>>>>>>> The expectation is that many deadline hints can be set on a fence.
>>>>>>>> The fence signaller should track the soonest deadline.
>>>>>>>
>>>>>>> You need to document that as UAPI, since it is observable to userspace.
>>>>>>> It would be bad if drivers or subsystems would differ in behaviour.
>>>>>>>         
>>>>>>
>>>>>> It is in the end a hint.  It is about giving the driver more
>>>>>> information so that it can make better choices.  But the driver is
>>>>>> even free to ignore it.  So maybe "expectation" is too strong of a
>>>>>> word.  Rather, any other behavior doesn't really make sense.  But it
>>>>>> could end up being dictated by how the hw and/or fw works.
>>>>>
>>>>> It will stop being a hint once it has been implemented and used in the
>>>>> wild long enough. The kernel userspace regression rules make sure of
>>>>> that.
>>>>
>>>> Yeah, tricky and maybe a gray area in this case. I think we eluded
>>>> elsewhere in the thread that renaming the thing might be an option.
>>>>
>>>> So maybe instead of deadline, which is a very strong word, use something
>>>> along the lines of "present time hint", or "signalled time hint"? Maybe
>>>> reads clumsy. Just throwing some ideas for a start.
>>>
>>> You can try, but I fear that if it ever changes behaviour and
>>> someone notices that, it's labelled as a kernel regression. I don't
>>> think documentation has ever been the authoritative definition of UABI
>>> in Linux, it just guides drivers and userspace towards a common
>>> understanding and common usage patterns.
>>>
>>> So even if the UABI contract is not documented (ugh), you need to be
>>> prepared to set the UABI contract through kernel implementation.
>>
>> To be the devil's advocate it probably wouldn't be an ABI regression but
>> just an regression. Same way as what nice(2) priorities mean hasn't
>> always been the same over the years, I don't think there is a strict
>> contract.
>>
>> Having said that, it may be different with latency sensitive stuff such
>> as UIs though since it is very observable and can be very painful to users.
>>
>>> If you do not document the UABI contract, then different drivers are
>>> likely to implement it differently, leading to differing behaviour.
>>> Also userspace will invent wild ways to abuse the UABI if there is no
>>> documentation guiding it on proper use. If userspace or end users
>>> observe different behaviour, that's bad even if it's not a regression.
>>>
>>> I don't like the situation either, but it is what it is. UABI stability
>>> trumps everything regardless of whether it was documented or not.
>>>
>>> I bet userspace is going to use this as a "make it faster, make it
>>> hotter" button. I would not be surprised if someone wrote a LD_PRELOAD
>>> library that stamps any and all fences with an expired deadline to
>>> just squeeze out a little more through some weird side-effect.
>>>
>>> Well, that's hopefully overboard in scaring, but in the end, I would
>>> like to see UABI documented so I can have a feeling of what it is for
>>> and how it was intended to be used. That's all.
>>
>> We share the same concern. If you read elsewhere in these threads you
>> will notice I have been calling this an "arms race". If the ability to
>> make yourself go faster does not required additional privilege I also
>> worry everyone will do it at which point it becomes pointless. So yes, I
>> do share this concern about exposing any of this as an unprivileged uapi.
>>
>> Is it possible to limit access to only compositors in some sane way?
>> Sounds tricky when dma-fence should be disconnected from DRM..
> 
> Maybe it's not that bad in this particular case, because we are talking
> only about boosting GPU clocks which benefits everyone (except
> battery life) and it does not penalize other programs like e.g.
> job priorities do.

Apart from efficiency that you mentioned, which does not always favor 
higher clocks, sometimes thermal budget is also shared between CPU and 
GPU. So more GPU clocks can mean fewer CPU clocks. It's really hard to 
make optimal choices without the full coordination between both schedulers.

But that is even not the main point, which is that if everyone sets the 
immediate deadline then having the deadline API is a bit pointless. For 
instance there is a reason negative nice needs CAP_SYS_ADMIN.

However Rob has also pointed out the existence of uclamp.min via 
sched_setattr which is unprivileged and can influence frequency 
selection in the CPU world, so I conceded on that point. If CPU world 
has accepted it so can we I guess.

So IMO we are back to whether we can agree defining it is a hint is good 
enough, be in via the name of the ioctl/flag itself or via documentation.

> Drivers are not going to use the deadline for scheduling priorities,
> right? I don't recall seeing any mention of that.
> 
> ...right?

I wouldn't have thought it would be beneficial to preclude that, or 
assume what drivers would do with the info to begin with.

For instance in i915 we almost had a deadline based scheduler which was 
much fairer than the current priority sorted fifo and in an ideal world 
we would either revive or re-implement that idea. In which case 
considering the fence deadline would naturally slot in and give true 
integration with compositor deadlines (not just boost clocks and pray it 
helps).

Regards,

Tvrtko

Luben Tuikov Feb. 24, 2023, 3:26 p.m. UTC | #21

On 2023-02-24 06:37, Tvrtko Ursulin wrote:
> 
> On 24/02/2023 11:00, Pekka Paalanen wrote:
>> On Fri, 24 Feb 2023 10:50:51 +0000
>> Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> wrote:
>>
>>> On 24/02/2023 10:24, Pekka Paalanen wrote:
>>>> On Fri, 24 Feb 2023 09:41:46 +0000
>>>> Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> wrote:
>>>>    
>>>>> On 24/02/2023 09:26, Pekka Paalanen wrote:
>>>>>> On Thu, 23 Feb 2023 10:51:48 -0800
>>>>>> Rob Clark <robdclark@gmail.com> wrote:
>>>>>>       
>>>>>>> On Thu, Feb 23, 2023 at 1:38 AM Pekka Paalanen <ppaalanen@gmail.com> wrote:
>>>>>>>>
>>>>>>>> On Wed, 22 Feb 2023 07:37:26 -0800
>>>>>>>> Rob Clark <robdclark@gmail.com> wrote:
>>>>>>>>         
>>>>>>>>> On Wed, Feb 22, 2023 at 1:49 AM Pekka Paalanen <ppaalanen@gmail.com> wrote:
>>>>>>
>>>>>> ...
>>>>>>       
>>>>>>>>>> On another matter, if the application uses SET_DEADLINE with one
>>>>>>>>>> timestamp, and the compositor uses SET_DEADLINE on the same thing with
>>>>>>>>>> another timestamp, what should happen?
>>>>>>>>>
>>>>>>>>> The expectation is that many deadline hints can be set on a fence.
>>>>>>>>> The fence signaller should track the soonest deadline.
>>>>>>>>
>>>>>>>> You need to document that as UAPI, since it is observable to userspace.
>>>>>>>> It would be bad if drivers or subsystems would differ in behaviour.
>>>>>>>>         
>>>>>>>
>>>>>>> It is in the end a hint.  It is about giving the driver more
>>>>>>> information so that it can make better choices.  But the driver is
>>>>>>> even free to ignore it.  So maybe "expectation" is too strong of a
>>>>>>> word.  Rather, any other behavior doesn't really make sense.  But it
>>>>>>> could end up being dictated by how the hw and/or fw works.
>>>>>>
>>>>>> It will stop being a hint once it has been implemented and used in the
>>>>>> wild long enough. The kernel userspace regression rules make sure of
>>>>>> that.
>>>>>
>>>>> Yeah, tricky and maybe a gray area in this case. I think we eluded
>>>>> elsewhere in the thread that renaming the thing might be an option.
>>>>>
>>>>> So maybe instead of deadline, which is a very strong word, use something
>>>>> along the lines of "present time hint", or "signalled time hint"? Maybe
>>>>> reads clumsy. Just throwing some ideas for a start.
>>>>
>>>> You can try, but I fear that if it ever changes behaviour and
>>>> someone notices that, it's labelled as a kernel regression. I don't
>>>> think documentation has ever been the authoritative definition of UABI
>>>> in Linux, it just guides drivers and userspace towards a common
>>>> understanding and common usage patterns.
>>>>
>>>> So even if the UABI contract is not documented (ugh), you need to be
>>>> prepared to set the UABI contract through kernel implementation.
>>>
>>> To be the devil's advocate it probably wouldn't be an ABI regression but
>>> just an regression. Same way as what nice(2) priorities mean hasn't
>>> always been the same over the years, I don't think there is a strict
>>> contract.
>>>
>>> Having said that, it may be different with latency sensitive stuff such
>>> as UIs though since it is very observable and can be very painful to users.
>>>
>>>> If you do not document the UABI contract, then different drivers are
>>>> likely to implement it differently, leading to differing behaviour.
>>>> Also userspace will invent wild ways to abuse the UABI if there is no
>>>> documentation guiding it on proper use. If userspace or end users
>>>> observe different behaviour, that's bad even if it's not a regression.
>>>>
>>>> I don't like the situation either, but it is what it is. UABI stability
>>>> trumps everything regardless of whether it was documented or not.
>>>>
>>>> I bet userspace is going to use this as a "make it faster, make it
>>>> hotter" button. I would not be surprised if someone wrote a LD_PRELOAD
>>>> library that stamps any and all fences with an expired deadline to
>>>> just squeeze out a little more through some weird side-effect.
>>>>
>>>> Well, that's hopefully overboard in scaring, but in the end, I would
>>>> like to see UABI documented so I can have a feeling of what it is for
>>>> and how it was intended to be used. That's all.
>>>
>>> We share the same concern. If you read elsewhere in these threads you
>>> will notice I have been calling this an "arms race". If the ability to
>>> make yourself go faster does not required additional privilege I also
>>> worry everyone will do it at which point it becomes pointless. So yes, I
>>> do share this concern about exposing any of this as an unprivileged uapi.
>>>
>>> Is it possible to limit access to only compositors in some sane way?
>>> Sounds tricky when dma-fence should be disconnected from DRM..
>>
>> Maybe it's not that bad in this particular case, because we are talking
>> only about boosting GPU clocks which benefits everyone (except
>> battery life) and it does not penalize other programs like e.g.
>> job priorities do.
> 
> Apart from efficiency that you mentioned, which does not always favor 
> higher clocks, sometimes thermal budget is also shared between CPU and 
> GPU. So more GPU clocks can mean fewer CPU clocks. It's really hard to 
> make optimal choices without the full coordination between both schedulers.
> 
> But that is even not the main point, which is that if everyone sets the 
> immediate deadline then having the deadline API is a bit pointless. For 
> instance there is a reason negative nice needs CAP_SYS_ADMIN.
> 
> However Rob has also pointed out the existence of uclamp.min via 
> sched_setattr which is unprivileged and can influence frequency 
> selection in the CPU world, so I conceded on that point. If CPU world 
> has accepted it so can we I guess.
> 
> So IMO we are back to whether we can agree defining it is a hint is good 
> enough, be in via the name of the ioctl/flag itself or via documentation.
> 
>> Drivers are not going to use the deadline for scheduling priorities,
>> right? I don't recall seeing any mention of that.
>>
>> ...right?
> 
> I wouldn't have thought it would be beneficial to preclude that, or 
> assume what drivers would do with the info to begin with.
> 
> For instance in i915 we almost had a deadline based scheduler which was 
> much fairer than the current priority sorted fifo and in an ideal world 
> we would either revive or re-implement that idea. In which case 
> considering the fence deadline would naturally slot in and give true 
> integration with compositor deadlines (not just boost clocks and pray it 
> helps).
How is user-space to decide whether to use ioctl(SET_DEADLINE) or
poll(POLLPRI)?

Rob Clark Feb. 24, 2023, 4:59 p.m. UTC | #22

On Fri, Feb 24, 2023 at 2:24 AM Pekka Paalanen <ppaalanen@gmail.com> wrote:
>
> On Fri, 24 Feb 2023 09:41:46 +0000
> Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> wrote:
>
> > On 24/02/2023 09:26, Pekka Paalanen wrote:
> > > On Thu, 23 Feb 2023 10:51:48 -0800
> > > Rob Clark <robdclark@gmail.com> wrote:
> > >
> > >> On Thu, Feb 23, 2023 at 1:38 AM Pekka Paalanen <ppaalanen@gmail.com> wrote:
> > >>>
> > >>> On Wed, 22 Feb 2023 07:37:26 -0800
> > >>> Rob Clark <robdclark@gmail.com> wrote:
> > >>>
> > >>>> On Wed, Feb 22, 2023 at 1:49 AM Pekka Paalanen <ppaalanen@gmail.com> wrote:
> > >
> > > ...
> > >
> > >>>>> On another matter, if the application uses SET_DEADLINE with one
> > >>>>> timestamp, and the compositor uses SET_DEADLINE on the same thing with
> > >>>>> another timestamp, what should happen?
> > >>>>
> > >>>> The expectation is that many deadline hints can be set on a fence.
> > >>>> The fence signaller should track the soonest deadline.
> > >>>
> > >>> You need to document that as UAPI, since it is observable to userspace.
> > >>> It would be bad if drivers or subsystems would differ in behaviour.
> > >>>
> > >>
> > >> It is in the end a hint.  It is about giving the driver more
> > >> information so that it can make better choices.  But the driver is
> > >> even free to ignore it.  So maybe "expectation" is too strong of a
> > >> word.  Rather, any other behavior doesn't really make sense.  But it
> > >> could end up being dictated by how the hw and/or fw works.
> > >
> > > It will stop being a hint once it has been implemented and used in the
> > > wild long enough. The kernel userspace regression rules make sure of
> > > that.
> >
> > Yeah, tricky and maybe a gray area in this case. I think we eluded
> > elsewhere in the thread that renaming the thing might be an option.
> >
> > So maybe instead of deadline, which is a very strong word, use something
> > along the lines of "present time hint", or "signalled time hint"? Maybe
> > reads clumsy. Just throwing some ideas for a start.
>
> You can try, but I fear that if it ever changes behaviour and
> someone notices that, it's labelled as a kernel regression. I don't
> think documentation has ever been the authoritative definition of UABI
> in Linux, it just guides drivers and userspace towards a common
> understanding and common usage patterns.
>
> So even if the UABI contract is not documented (ugh), you need to be
> prepared to set the UABI contract through kernel implementation.
>
> If you do not document the UABI contract, then different drivers are
> likely to implement it differently, leading to differing behaviour.
> Also userspace will invent wild ways to abuse the UABI if there is no
> documentation guiding it on proper use. If userspace or end users
> observe different behaviour, that's bad even if it's not a regression.
>
> I don't like the situation either, but it is what it is. UABI stability
> trumps everything regardless of whether it was documented or not.
>
> I bet userspace is going to use this as a "make it faster, make it
> hotter" button. I would not be surprised if someone wrote a LD_PRELOAD
> library that stamps any and all fences with an expired deadline to
> just squeeze out a little more through some weird side-effect.

Ok, maybe we can rename the SET_DEADLINE ioctl to SPACEBAR_HEATER ;-)

BR,
-R

> Well, that's hopefully overboard in scaring, but in the end, I would
> like to see UABI documented so I can have a feeling of what it is for
> and how it was intended to be used. That's all.
>
>
> Thanks,
> pq

Rob Clark Feb. 24, 2023, 5:59 p.m. UTC | #23

On Fri, Feb 24, 2023 at 7:27 AM Luben Tuikov <luben.tuikov@amd.com> wrote:
>
> On 2023-02-24 06:37, Tvrtko Ursulin wrote:
> >
> > On 24/02/2023 11:00, Pekka Paalanen wrote:
> >> On Fri, 24 Feb 2023 10:50:51 +0000
> >> Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> wrote:
> >>
> >>> On 24/02/2023 10:24, Pekka Paalanen wrote:
> >>>> On Fri, 24 Feb 2023 09:41:46 +0000
> >>>> Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> wrote:
> >>>>
> >>>>> On 24/02/2023 09:26, Pekka Paalanen wrote:
> >>>>>> On Thu, 23 Feb 2023 10:51:48 -0800
> >>>>>> Rob Clark <robdclark@gmail.com> wrote:
> >>>>>>
> >>>>>>> On Thu, Feb 23, 2023 at 1:38 AM Pekka Paalanen <ppaalanen@gmail.com> wrote:
> >>>>>>>>
> >>>>>>>> On Wed, 22 Feb 2023 07:37:26 -0800
> >>>>>>>> Rob Clark <robdclark@gmail.com> wrote:
> >>>>>>>>
> >>>>>>>>> On Wed, Feb 22, 2023 at 1:49 AM Pekka Paalanen <ppaalanen@gmail.com> wrote:
> >>>>>>
> >>>>>> ...
> >>>>>>
> >>>>>>>>>> On another matter, if the application uses SET_DEADLINE with one
> >>>>>>>>>> timestamp, and the compositor uses SET_DEADLINE on the same thing with
> >>>>>>>>>> another timestamp, what should happen?
> >>>>>>>>>
> >>>>>>>>> The expectation is that many deadline hints can be set on a fence.
> >>>>>>>>> The fence signaller should track the soonest deadline.
> >>>>>>>>
> >>>>>>>> You need to document that as UAPI, since it is observable to userspace.
> >>>>>>>> It would be bad if drivers or subsystems would differ in behaviour.
> >>>>>>>>
> >>>>>>>
> >>>>>>> It is in the end a hint.  It is about giving the driver more
> >>>>>>> information so that it can make better choices.  But the driver is
> >>>>>>> even free to ignore it.  So maybe "expectation" is too strong of a
> >>>>>>> word.  Rather, any other behavior doesn't really make sense.  But it
> >>>>>>> could end up being dictated by how the hw and/or fw works.
> >>>>>>
> >>>>>> It will stop being a hint once it has been implemented and used in the
> >>>>>> wild long enough. The kernel userspace regression rules make sure of
> >>>>>> that.
> >>>>>
> >>>>> Yeah, tricky and maybe a gray area in this case. I think we eluded
> >>>>> elsewhere in the thread that renaming the thing might be an option.
> >>>>>
> >>>>> So maybe instead of deadline, which is a very strong word, use something
> >>>>> along the lines of "present time hint", or "signalled time hint"? Maybe
> >>>>> reads clumsy. Just throwing some ideas for a start.
> >>>>
> >>>> You can try, but I fear that if it ever changes behaviour and
> >>>> someone notices that, it's labelled as a kernel regression. I don't
> >>>> think documentation has ever been the authoritative definition of UABI
> >>>> in Linux, it just guides drivers and userspace towards a common
> >>>> understanding and common usage patterns.
> >>>>
> >>>> So even if the UABI contract is not documented (ugh), you need to be
> >>>> prepared to set the UABI contract through kernel implementation.
> >>>
> >>> To be the devil's advocate it probably wouldn't be an ABI regression but
> >>> just an regression. Same way as what nice(2) priorities mean hasn't
> >>> always been the same over the years, I don't think there is a strict
> >>> contract.
> >>>
> >>> Having said that, it may be different with latency sensitive stuff such
> >>> as UIs though since it is very observable and can be very painful to users.
> >>>
> >>>> If you do not document the UABI contract, then different drivers are
> >>>> likely to implement it differently, leading to differing behaviour.
> >>>> Also userspace will invent wild ways to abuse the UABI if there is no
> >>>> documentation guiding it on proper use. If userspace or end users
> >>>> observe different behaviour, that's bad even if it's not a regression.
> >>>>
> >>>> I don't like the situation either, but it is what it is. UABI stability
> >>>> trumps everything regardless of whether it was documented or not.
> >>>>
> >>>> I bet userspace is going to use this as a "make it faster, make it
> >>>> hotter" button. I would not be surprised if someone wrote a LD_PRELOAD
> >>>> library that stamps any and all fences with an expired deadline to
> >>>> just squeeze out a little more through some weird side-effect.
> >>>>
> >>>> Well, that's hopefully overboard in scaring, but in the end, I would
> >>>> like to see UABI documented so I can have a feeling of what it is for
> >>>> and how it was intended to be used. That's all.
> >>>
> >>> We share the same concern. If you read elsewhere in these threads you
> >>> will notice I have been calling this an "arms race". If the ability to
> >>> make yourself go faster does not required additional privilege I also
> >>> worry everyone will do it at which point it becomes pointless. So yes, I
> >>> do share this concern about exposing any of this as an unprivileged uapi.
> >>>
> >>> Is it possible to limit access to only compositors in some sane way?
> >>> Sounds tricky when dma-fence should be disconnected from DRM..
> >>
> >> Maybe it's not that bad in this particular case, because we are talking
> >> only about boosting GPU clocks which benefits everyone (except
> >> battery life) and it does not penalize other programs like e.g.
> >> job priorities do.
> >
> > Apart from efficiency that you mentioned, which does not always favor
> > higher clocks, sometimes thermal budget is also shared between CPU and
> > GPU. So more GPU clocks can mean fewer CPU clocks. It's really hard to
> > make optimal choices without the full coordination between both schedulers.
> >
> > But that is even not the main point, which is that if everyone sets the
> > immediate deadline then having the deadline API is a bit pointless. For
> > instance there is a reason negative nice needs CAP_SYS_ADMIN.
> >
> > However Rob has also pointed out the existence of uclamp.min via
> > sched_setattr which is unprivileged and can influence frequency
> > selection in the CPU world, so I conceded on that point. If CPU world
> > has accepted it so can we I guess.
> >
> > So IMO we are back to whether we can agree defining it is a hint is good
> > enough, be in via the name of the ioctl/flag itself or via documentation.
> >
> >> Drivers are not going to use the deadline for scheduling priorities,
> >> right? I don't recall seeing any mention of that.
> >>
> >> ...right?
> >
> > I wouldn't have thought it would be beneficial to preclude that, or
> > assume what drivers would do with the info to begin with.
> >
> > For instance in i915 we almost had a deadline based scheduler which was
> > much fairer than the current priority sorted fifo and in an ideal world
> > we would either revive or re-implement that idea. In which case
> > considering the fence deadline would naturally slot in and give true
> > integration with compositor deadlines (not just boost clocks and pray it
> > helps).
> How is user-space to decide whether to use ioctl(SET_DEADLINE) or
> poll(POLLPRI)?

Implementation of blocking gl/vk/cl APIs, like glFinish() would use
poll(POLLPRI).  It could also set an immediate deadline and then call
poll() without POLLPRI.

Other than compositors which do frame-pacing I expect the main usage
of either of these is mesa.

BR,
-R

> --
> Regards,
> Luben
>

Rob Clark Feb. 24, 2023, 7:44 p.m. UTC | #24

On Fri, Feb 24, 2023 at 2:24 AM Pekka Paalanen <ppaalanen@gmail.com> wrote:
>
> On Fri, 24 Feb 2023 09:41:46 +0000
> Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> wrote:
>
> > On 24/02/2023 09:26, Pekka Paalanen wrote:
> > > On Thu, 23 Feb 2023 10:51:48 -0800
> > > Rob Clark <robdclark@gmail.com> wrote:
> > >
> > >> On Thu, Feb 23, 2023 at 1:38 AM Pekka Paalanen <ppaalanen@gmail.com> wrote:
> > >>>
> > >>> On Wed, 22 Feb 2023 07:37:26 -0800
> > >>> Rob Clark <robdclark@gmail.com> wrote:
> > >>>
> > >>>> On Wed, Feb 22, 2023 at 1:49 AM Pekka Paalanen <ppaalanen@gmail.com> wrote:
> > >
> > > ...
> > >
> > >>>>> On another matter, if the application uses SET_DEADLINE with one
> > >>>>> timestamp, and the compositor uses SET_DEADLINE on the same thing with
> > >>>>> another timestamp, what should happen?
> > >>>>
> > >>>> The expectation is that many deadline hints can be set on a fence.
> > >>>> The fence signaller should track the soonest deadline.
> > >>>
> > >>> You need to document that as UAPI, since it is observable to userspace.
> > >>> It would be bad if drivers or subsystems would differ in behaviour.
> > >>>
> > >>
> > >> It is in the end a hint.  It is about giving the driver more
> > >> information so that it can make better choices.  But the driver is
> > >> even free to ignore it.  So maybe "expectation" is too strong of a
> > >> word.  Rather, any other behavior doesn't really make sense.  But it
> > >> could end up being dictated by how the hw and/or fw works.
> > >
> > > It will stop being a hint once it has been implemented and used in the
> > > wild long enough. The kernel userspace regression rules make sure of
> > > that.
> >
> > Yeah, tricky and maybe a gray area in this case. I think we eluded
> > elsewhere in the thread that renaming the thing might be an option.
> >
> > So maybe instead of deadline, which is a very strong word, use something
> > along the lines of "present time hint", or "signalled time hint"? Maybe
> > reads clumsy. Just throwing some ideas for a start.
>
> You can try, but I fear that if it ever changes behaviour and
> someone notices that, it's labelled as a kernel regression. I don't
> think documentation has ever been the authoritative definition of UABI
> in Linux, it just guides drivers and userspace towards a common
> understanding and common usage patterns.
>
> So even if the UABI contract is not documented (ugh), you need to be
> prepared to set the UABI contract through kernel implementation.
>
> If you do not document the UABI contract, then different drivers are
> likely to implement it differently, leading to differing behaviour.
> Also userspace will invent wild ways to abuse the UABI if there is no
> documentation guiding it on proper use. If userspace or end users
> observe different behaviour, that's bad even if it's not a regression.
>
> I don't like the situation either, but it is what it is. UABI stability
> trumps everything regardless of whether it was documented or not.
>
> I bet userspace is going to use this as a "make it faster, make it
> hotter" button. I would not be surprised if someone wrote a LD_PRELOAD
> library that stamps any and all fences with an expired deadline to
> just squeeze out a little more through some weird side-effect.

Userspace already has various (driver specific) debugfs/sysfs that it
can use if it wants to make it hotter and drain batteries faster, so
I'm not seeing a strong need to cater to the "turn it up to eleven"
crowd here.  And really your point feels like a good reason to _not_
document this as anything more than a hint.

Back in the real world, mobile games are already well aware of the fps
vs battery-life (and therefore gameplay) tradeoff.  But what is
missing is a way to inform the kernel of userspace's intentions, so
that gpu dvfs can make intelligent decisions.  This series is meant to
bridge that gap.

BR,
-R

> Well, that's hopefully overboard in scaring, but in the end, I would
> like to see UABI documented so I can have a feeling of what it is for
> and how it was intended to be used. That's all.
>
>
> Thanks,
> pq

Pekka Paalanen Feb. 27, 2023, 9:34 a.m. UTC | #25

On Fri, 24 Feb 2023 11:44:53 -0800
Rob Clark <robdclark@gmail.com> wrote:

> On Fri, Feb 24, 2023 at 2:24 AM Pekka Paalanen <ppaalanen@gmail.com> wrote:
> >
> > On Fri, 24 Feb 2023 09:41:46 +0000
> > Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> wrote:
> >  
> > > On 24/02/2023 09:26, Pekka Paalanen wrote:  
> > > > On Thu, 23 Feb 2023 10:51:48 -0800
> > > > Rob Clark <robdclark@gmail.com> wrote:
> > > >  
> > > >> On Thu, Feb 23, 2023 at 1:38 AM Pekka Paalanen <ppaalanen@gmail.com> wrote:  
> > > >>>
> > > >>> On Wed, 22 Feb 2023 07:37:26 -0800
> > > >>> Rob Clark <robdclark@gmail.com> wrote:
> > > >>>  
> > > >>>> On Wed, Feb 22, 2023 at 1:49 AM Pekka Paalanen <ppaalanen@gmail.com> wrote:  
> > > >
> > > > ...
> > > >  
> > > >>>>> On another matter, if the application uses SET_DEADLINE with one
> > > >>>>> timestamp, and the compositor uses SET_DEADLINE on the same thing with
> > > >>>>> another timestamp, what should happen?  
> > > >>>>
> > > >>>> The expectation is that many deadline hints can be set on a fence.
> > > >>>> The fence signaller should track the soonest deadline.  
> > > >>>
> > > >>> You need to document that as UAPI, since it is observable to userspace.
> > > >>> It would be bad if drivers or subsystems would differ in behaviour.
> > > >>>  
> > > >>
> > > >> It is in the end a hint.  It is about giving the driver more
> > > >> information so that it can make better choices.  But the driver is
> > > >> even free to ignore it.  So maybe "expectation" is too strong of a
> > > >> word.  Rather, any other behavior doesn't really make sense.  But it
> > > >> could end up being dictated by how the hw and/or fw works.  
> > > >
> > > > It will stop being a hint once it has been implemented and used in the
> > > > wild long enough. The kernel userspace regression rules make sure of
> > > > that.  
> > >
> > > Yeah, tricky and maybe a gray area in this case. I think we eluded
> > > elsewhere in the thread that renaming the thing might be an option.
> > >
> > > So maybe instead of deadline, which is a very strong word, use something
> > > along the lines of "present time hint", or "signalled time hint"? Maybe
> > > reads clumsy. Just throwing some ideas for a start.  
> >
> > You can try, but I fear that if it ever changes behaviour and
> > someone notices that, it's labelled as a kernel regression. I don't
> > think documentation has ever been the authoritative definition of UABI
> > in Linux, it just guides drivers and userspace towards a common
> > understanding and common usage patterns.
> >
> > So even if the UABI contract is not documented (ugh), you need to be
> > prepared to set the UABI contract through kernel implementation.
> >
> > If you do not document the UABI contract, then different drivers are
> > likely to implement it differently, leading to differing behaviour.
> > Also userspace will invent wild ways to abuse the UABI if there is no
> > documentation guiding it on proper use. If userspace or end users
> > observe different behaviour, that's bad even if it's not a regression.
> >
> > I don't like the situation either, but it is what it is. UABI stability
> > trumps everything regardless of whether it was documented or not.
> >
> > I bet userspace is going to use this as a "make it faster, make it
> > hotter" button. I would not be surprised if someone wrote a LD_PRELOAD
> > library that stamps any and all fences with an expired deadline to
> > just squeeze out a little more through some weird side-effect.  
> 
> Userspace already has various (driver specific) debugfs/sysfs that it
> can use if it wants to make it hotter and drain batteries faster, so
> I'm not seeing a strong need to cater to the "turn it up to eleven"
> crowd here.  And really your point feels like a good reason to _not_
> document this as anything more than a hint.

My point is that no matter what you say in documentation or leave
unsaid, people can and will abuse this by the behaviour it provides
anyway, like every other UABI.

So why not just document what it is supposed to do? It cannot get any
worse. Maybe you get lucky instead and people don't abuse it that much
if they read the docs.

E.g. can it affect GPU job scheduling, can it affect GPU clocks, can it
affect power consumption, and so on.

> Back in the real world, mobile games are already well aware of the fps
> vs battery-life (and therefore gameplay) tradeoff.  But what is
> missing is a way to inform the kernel of userspace's intentions, so
> that gpu dvfs can make intelligent decisions.  This series is meant to
> bridge that gap.

Then document that. As long as you document it properly: what you
expect it to be used for and how.

Or if this is reserved strictly for Mesa drivers, then document that.

You can also stop CC'ing me if you don't want attention to UABI docs. I
don't read dri-devel@ unless I'm explicitly CC'd nowadays.


Thanks,
pq

Rob Clark Feb. 27, 2023, 6:43 p.m. UTC | #26

On Mon, Feb 27, 2023 at 1:34 AM Pekka Paalanen <ppaalanen@gmail.com> wrote:
>
> On Fri, 24 Feb 2023 11:44:53 -0800
> Rob Clark <robdclark@gmail.com> wrote:
>
> > On Fri, Feb 24, 2023 at 2:24 AM Pekka Paalanen <ppaalanen@gmail.com> wrote:
> > >
> > > On Fri, 24 Feb 2023 09:41:46 +0000
> > > Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> wrote:
> > >
> > > > On 24/02/2023 09:26, Pekka Paalanen wrote:
> > > > > On Thu, 23 Feb 2023 10:51:48 -0800
> > > > > Rob Clark <robdclark@gmail.com> wrote:
> > > > >
> > > > >> On Thu, Feb 23, 2023 at 1:38 AM Pekka Paalanen <ppaalanen@gmail.com> wrote:
> > > > >>>
> > > > >>> On Wed, 22 Feb 2023 07:37:26 -0800
> > > > >>> Rob Clark <robdclark@gmail.com> wrote:
> > > > >>>
> > > > >>>> On Wed, Feb 22, 2023 at 1:49 AM Pekka Paalanen <ppaalanen@gmail.com> wrote:
> > > > >
> > > > > ...
> > > > >
> > > > >>>>> On another matter, if the application uses SET_DEADLINE with one
> > > > >>>>> timestamp, and the compositor uses SET_DEADLINE on the same thing with
> > > > >>>>> another timestamp, what should happen?
> > > > >>>>
> > > > >>>> The expectation is that many deadline hints can be set on a fence.
> > > > >>>> The fence signaller should track the soonest deadline.
> > > > >>>
> > > > >>> You need to document that as UAPI, since it is observable to userspace.
> > > > >>> It would be bad if drivers or subsystems would differ in behaviour.
> > > > >>>
> > > > >>
> > > > >> It is in the end a hint.  It is about giving the driver more
> > > > >> information so that it can make better choices.  But the driver is
> > > > >> even free to ignore it.  So maybe "expectation" is too strong of a
> > > > >> word.  Rather, any other behavior doesn't really make sense.  But it
> > > > >> could end up being dictated by how the hw and/or fw works.
> > > > >
> > > > > It will stop being a hint once it has been implemented and used in the
> > > > > wild long enough. The kernel userspace regression rules make sure of
> > > > > that.
> > > >
> > > > Yeah, tricky and maybe a gray area in this case. I think we eluded
> > > > elsewhere in the thread that renaming the thing might be an option.
> > > >
> > > > So maybe instead of deadline, which is a very strong word, use something
> > > > along the lines of "present time hint", or "signalled time hint"? Maybe
> > > > reads clumsy. Just throwing some ideas for a start.
> > >
> > > You can try, but I fear that if it ever changes behaviour and
> > > someone notices that, it's labelled as a kernel regression. I don't
> > > think documentation has ever been the authoritative definition of UABI
> > > in Linux, it just guides drivers and userspace towards a common
> > > understanding and common usage patterns.
> > >
> > > So even if the UABI contract is not documented (ugh), you need to be
> > > prepared to set the UABI contract through kernel implementation.
> > >
> > > If you do not document the UABI contract, then different drivers are
> > > likely to implement it differently, leading to differing behaviour.
> > > Also userspace will invent wild ways to abuse the UABI if there is no
> > > documentation guiding it on proper use. If userspace or end users
> > > observe different behaviour, that's bad even if it's not a regression.
> > >
> > > I don't like the situation either, but it is what it is. UABI stability
> > > trumps everything regardless of whether it was documented or not.
> > >
> > > I bet userspace is going to use this as a "make it faster, make it
> > > hotter" button. I would not be surprised if someone wrote a LD_PRELOAD
> > > library that stamps any and all fences with an expired deadline to
> > > just squeeze out a little more through some weird side-effect.
> >
> > Userspace already has various (driver specific) debugfs/sysfs that it
> > can use if it wants to make it hotter and drain batteries faster, so
> > I'm not seeing a strong need to cater to the "turn it up to eleven"
> > crowd here.  And really your point feels like a good reason to _not_
> > document this as anything more than a hint.
>
> My point is that no matter what you say in documentation or leave
> unsaid, people can and will abuse this by the behaviour it provides
> anyway, like every other UABI.
>
> So why not just document what it is supposed to do? It cannot get any
> worse. Maybe you get lucky instead and people don't abuse it that much
> if they read the docs.
>
> E.g. can it affect GPU job scheduling, can it affect GPU clocks, can it
> affect power consumption, and so on.

I expect, potentially, all or any, or none of the above ;-)

I guess I could document it as such, just to preempt potential
complaints about broken spacebar-heater.  The question is, where?  I
could add something about fence deadline hints in dma-buf.rst, which
would I think be useful in general for driver writers.  But there
isn't really any existing documents other than headerdoc comments for
sync_file ioctls, or drm_syncobj related ioctls (the latter are really
just for mesa to use, so maybe that is ok).

>
> > Back in the real world, mobile games are already well aware of the fps
> > vs battery-life (and therefore gameplay) tradeoff.  But what is
> > missing is a way to inform the kernel of userspace's intentions, so
> > that gpu dvfs can make intelligent decisions.  This series is meant to
> > bridge that gap.
>
> Then document that. As long as you document it properly: what you
> expect it to be used for and how.
>
> Or if this is reserved strictly for Mesa drivers, then document that.

I expect the SET_DEADLINE ioctl to be useful to compositors, and maybe
a few other cases.  I'd like to use it in virglrenderer to bridge
guest deadline hints to host kernel, for example.

BR,
-R

> You can also stop CC'ing me if you don't want attention to UABI docs. I
> don't read dri-devel@ unless I'm explicitly CC'd nowadays.
>
>
> Thanks,
> pq

Rodrigo Vivi Feb. 27, 2023, 9:35 p.m. UTC | #27

On Fri, Feb 24, 2023 at 09:59:57AM -0800, Rob Clark wrote:
> On Fri, Feb 24, 2023 at 7:27 AM Luben Tuikov <luben.tuikov@amd.com> wrote:
> >
> > On 2023-02-24 06:37, Tvrtko Ursulin wrote:
> > >
> > > On 24/02/2023 11:00, Pekka Paalanen wrote:
> > >> On Fri, 24 Feb 2023 10:50:51 +0000
> > >> Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> wrote:
> > >>
> > >>> On 24/02/2023 10:24, Pekka Paalanen wrote:
> > >>>> On Fri, 24 Feb 2023 09:41:46 +0000
> > >>>> Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> wrote:
> > >>>>
> > >>>>> On 24/02/2023 09:26, Pekka Paalanen wrote:
> > >>>>>> On Thu, 23 Feb 2023 10:51:48 -0800
> > >>>>>> Rob Clark <robdclark@gmail.com> wrote:
> > >>>>>>
> > >>>>>>> On Thu, Feb 23, 2023 at 1:38 AM Pekka Paalanen <ppaalanen@gmail.com> wrote:
> > >>>>>>>>
> > >>>>>>>> On Wed, 22 Feb 2023 07:37:26 -0800
> > >>>>>>>> Rob Clark <robdclark@gmail.com> wrote:
> > >>>>>>>>
> > >>>>>>>>> On Wed, Feb 22, 2023 at 1:49 AM Pekka Paalanen <ppaalanen@gmail.com> wrote:
> > >>>>>>
> > >>>>>> ...
> > >>>>>>
> > >>>>>>>>>> On another matter, if the application uses SET_DEADLINE with one
> > >>>>>>>>>> timestamp, and the compositor uses SET_DEADLINE on the same thing with
> > >>>>>>>>>> another timestamp, what should happen?
> > >>>>>>>>>
> > >>>>>>>>> The expectation is that many deadline hints can be set on a fence.
> > >>>>>>>>> The fence signaller should track the soonest deadline.
> > >>>>>>>>
> > >>>>>>>> You need to document that as UAPI, since it is observable to userspace.
> > >>>>>>>> It would be bad if drivers or subsystems would differ in behaviour.
> > >>>>>>>>
> > >>>>>>>
> > >>>>>>> It is in the end a hint.  It is about giving the driver more
> > >>>>>>> information so that it can make better choices.  But the driver is
> > >>>>>>> even free to ignore it.  So maybe "expectation" is too strong of a
> > >>>>>>> word.  Rather, any other behavior doesn't really make sense.  But it
> > >>>>>>> could end up being dictated by how the hw and/or fw works.
> > >>>>>>
> > >>>>>> It will stop being a hint once it has been implemented and used in the
> > >>>>>> wild long enough. The kernel userspace regression rules make sure of
> > >>>>>> that.
> > >>>>>
> > >>>>> Yeah, tricky and maybe a gray area in this case. I think we eluded
> > >>>>> elsewhere in the thread that renaming the thing might be an option.
> > >>>>>
> > >>>>> So maybe instead of deadline, which is a very strong word, use something
> > >>>>> along the lines of "present time hint", or "signalled time hint"? Maybe
> > >>>>> reads clumsy. Just throwing some ideas for a start.
> > >>>>
> > >>>> You can try, but I fear that if it ever changes behaviour and
> > >>>> someone notices that, it's labelled as a kernel regression. I don't
> > >>>> think documentation has ever been the authoritative definition of UABI
> > >>>> in Linux, it just guides drivers and userspace towards a common
> > >>>> understanding and common usage patterns.
> > >>>>
> > >>>> So even if the UABI contract is not documented (ugh), you need to be
> > >>>> prepared to set the UABI contract through kernel implementation.
> > >>>
> > >>> To be the devil's advocate it probably wouldn't be an ABI regression but
> > >>> just an regression. Same way as what nice(2) priorities mean hasn't
> > >>> always been the same over the years, I don't think there is a strict
> > >>> contract.
> > >>>
> > >>> Having said that, it may be different with latency sensitive stuff such
> > >>> as UIs though since it is very observable and can be very painful to users.
> > >>>
> > >>>> If you do not document the UABI contract, then different drivers are
> > >>>> likely to implement it differently, leading to differing behaviour.
> > >>>> Also userspace will invent wild ways to abuse the UABI if there is no
> > >>>> documentation guiding it on proper use. If userspace or end users
> > >>>> observe different behaviour, that's bad even if it's not a regression.
> > >>>>
> > >>>> I don't like the situation either, but it is what it is. UABI stability
> > >>>> trumps everything regardless of whether it was documented or not.
> > >>>>
> > >>>> I bet userspace is going to use this as a "make it faster, make it
> > >>>> hotter" button. I would not be surprised if someone wrote a LD_PRELOAD
> > >>>> library that stamps any and all fences with an expired deadline to
> > >>>> just squeeze out a little more through some weird side-effect.
> > >>>>
> > >>>> Well, that's hopefully overboard in scaring, but in the end, I would
> > >>>> like to see UABI documented so I can have a feeling of what it is for
> > >>>> and how it was intended to be used. That's all.
> > >>>
> > >>> We share the same concern. If you read elsewhere in these threads you
> > >>> will notice I have been calling this an "arms race". If the ability to
> > >>> make yourself go faster does not required additional privilege I also
> > >>> worry everyone will do it at which point it becomes pointless. So yes, I
> > >>> do share this concern about exposing any of this as an unprivileged uapi.
> > >>>
> > >>> Is it possible to limit access to only compositors in some sane way?
> > >>> Sounds tricky when dma-fence should be disconnected from DRM..
> > >>
> > >> Maybe it's not that bad in this particular case, because we are talking
> > >> only about boosting GPU clocks which benefits everyone (except
> > >> battery life) and it does not penalize other programs like e.g.
> > >> job priorities do.
> > >
> > > Apart from efficiency that you mentioned, which does not always favor
> > > higher clocks, sometimes thermal budget is also shared between CPU and
> > > GPU. So more GPU clocks can mean fewer CPU clocks. It's really hard to
> > > make optimal choices without the full coordination between both schedulers.
> > >
> > > But that is even not the main point, which is that if everyone sets the
> > > immediate deadline then having the deadline API is a bit pointless. For
> > > instance there is a reason negative nice needs CAP_SYS_ADMIN.
> > >
> > > However Rob has also pointed out the existence of uclamp.min via
> > > sched_setattr which is unprivileged and can influence frequency
> > > selection in the CPU world, so I conceded on that point. If CPU world
> > > has accepted it so can we I guess.
> > >
> > > So IMO we are back to whether we can agree defining it is a hint is good
> > > enough, be in via the name of the ioctl/flag itself or via documentation.
> > >
> > >> Drivers are not going to use the deadline for scheduling priorities,
> > >> right? I don't recall seeing any mention of that.
> > >>
> > >> ...right?
> > >
> > > I wouldn't have thought it would be beneficial to preclude that, or
> > > assume what drivers would do with the info to begin with.
> > >
> > > For instance in i915 we almost had a deadline based scheduler which was
> > > much fairer than the current priority sorted fifo and in an ideal world
> > > we would either revive or re-implement that idea. In which case
> > > considering the fence deadline would naturally slot in and give true
> > > integration with compositor deadlines (not just boost clocks and pray it
> > > helps).
> > How is user-space to decide whether to use ioctl(SET_DEADLINE) or
> > poll(POLLPRI)?
> 
> Implementation of blocking gl/vk/cl APIs, like glFinish() would use
> poll(POLLPRI).  It could also set an immediate deadline and then call
> poll() without POLLPRI.
> 
> Other than compositors which do frame-pacing I expect the main usage
> of either of these is mesa.

Okay, so it looks like we already agreed that having a way to bump frequency
from userspace is acceptable. either because there are already other ways
that you can waste power and because this already acceptable in the CPU
world.

But why we are doing this in hidden ways then?

Why can't we have this hint per context that is getting executed?
(either with a boost-context flag or with some low/med/max or '-1' to '1'
value like the latency priority)?

I don't like the waitboost because this heurisitic fails in some media cases.
I don't like the global setting because we might be alternating a top-priority
with low-priority cases...

So, why not something per context in execution?

> 
> BR,
> -R
> 
> > --
> > Regards,
> > Luben
> >

Rob Clark Feb. 27, 2023, 10:20 p.m. UTC | #28

On Mon, Feb 27, 2023 at 1:36 PM Rodrigo Vivi <rodrigo.vivi@intel.com> wrote:
>
> On Fri, Feb 24, 2023 at 09:59:57AM -0800, Rob Clark wrote:
> > On Fri, Feb 24, 2023 at 7:27 AM Luben Tuikov <luben.tuikov@amd.com> wrote:
> > >
> > > On 2023-02-24 06:37, Tvrtko Ursulin wrote:
> > > >
> > > > On 24/02/2023 11:00, Pekka Paalanen wrote:
> > > >> On Fri, 24 Feb 2023 10:50:51 +0000
> > > >> Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> wrote:
> > > >>
> > > >>> On 24/02/2023 10:24, Pekka Paalanen wrote:
> > > >>>> On Fri, 24 Feb 2023 09:41:46 +0000
> > > >>>> Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> wrote:
> > > >>>>
> > > >>>>> On 24/02/2023 09:26, Pekka Paalanen wrote:
> > > >>>>>> On Thu, 23 Feb 2023 10:51:48 -0800
> > > >>>>>> Rob Clark <robdclark@gmail.com> wrote:
> > > >>>>>>
> > > >>>>>>> On Thu, Feb 23, 2023 at 1:38 AM Pekka Paalanen <ppaalanen@gmail.com> wrote:
> > > >>>>>>>>
> > > >>>>>>>> On Wed, 22 Feb 2023 07:37:26 -0800
> > > >>>>>>>> Rob Clark <robdclark@gmail.com> wrote:
> > > >>>>>>>>
> > > >>>>>>>>> On Wed, Feb 22, 2023 at 1:49 AM Pekka Paalanen <ppaalanen@gmail.com> wrote:
> > > >>>>>>
> > > >>>>>> ...
> > > >>>>>>
> > > >>>>>>>>>> On another matter, if the application uses SET_DEADLINE with one
> > > >>>>>>>>>> timestamp, and the compositor uses SET_DEADLINE on the same thing with
> > > >>>>>>>>>> another timestamp, what should happen?
> > > >>>>>>>>>
> > > >>>>>>>>> The expectation is that many deadline hints can be set on a fence.
> > > >>>>>>>>> The fence signaller should track the soonest deadline.
> > > >>>>>>>>
> > > >>>>>>>> You need to document that as UAPI, since it is observable to userspace.
> > > >>>>>>>> It would be bad if drivers or subsystems would differ in behaviour.
> > > >>>>>>>>
> > > >>>>>>>
> > > >>>>>>> It is in the end a hint.  It is about giving the driver more
> > > >>>>>>> information so that it can make better choices.  But the driver is
> > > >>>>>>> even free to ignore it.  So maybe "expectation" is too strong of a
> > > >>>>>>> word.  Rather, any other behavior doesn't really make sense.  But it
> > > >>>>>>> could end up being dictated by how the hw and/or fw works.
> > > >>>>>>
> > > >>>>>> It will stop being a hint once it has been implemented and used in the
> > > >>>>>> wild long enough. The kernel userspace regression rules make sure of
> > > >>>>>> that.
> > > >>>>>
> > > >>>>> Yeah, tricky and maybe a gray area in this case. I think we eluded
> > > >>>>> elsewhere in the thread that renaming the thing might be an option.
> > > >>>>>
> > > >>>>> So maybe instead of deadline, which is a very strong word, use something
> > > >>>>> along the lines of "present time hint", or "signalled time hint"? Maybe
> > > >>>>> reads clumsy. Just throwing some ideas for a start.
> > > >>>>
> > > >>>> You can try, but I fear that if it ever changes behaviour and
> > > >>>> someone notices that, it's labelled as a kernel regression. I don't
> > > >>>> think documentation has ever been the authoritative definition of UABI
> > > >>>> in Linux, it just guides drivers and userspace towards a common
> > > >>>> understanding and common usage patterns.
> > > >>>>
> > > >>>> So even if the UABI contract is not documented (ugh), you need to be
> > > >>>> prepared to set the UABI contract through kernel implementation.
> > > >>>
> > > >>> To be the devil's advocate it probably wouldn't be an ABI regression but
> > > >>> just an regression. Same way as what nice(2) priorities mean hasn't
> > > >>> always been the same over the years, I don't think there is a strict
> > > >>> contract.
> > > >>>
> > > >>> Having said that, it may be different with latency sensitive stuff such
> > > >>> as UIs though since it is very observable and can be very painful to users.
> > > >>>
> > > >>>> If you do not document the UABI contract, then different drivers are
> > > >>>> likely to implement it differently, leading to differing behaviour.
> > > >>>> Also userspace will invent wild ways to abuse the UABI if there is no
> > > >>>> documentation guiding it on proper use. If userspace or end users
> > > >>>> observe different behaviour, that's bad even if it's not a regression.
> > > >>>>
> > > >>>> I don't like the situation either, but it is what it is. UABI stability
> > > >>>> trumps everything regardless of whether it was documented or not.
> > > >>>>
> > > >>>> I bet userspace is going to use this as a "make it faster, make it
> > > >>>> hotter" button. I would not be surprised if someone wrote a LD_PRELOAD
> > > >>>> library that stamps any and all fences with an expired deadline to
> > > >>>> just squeeze out a little more through some weird side-effect.
> > > >>>>
> > > >>>> Well, that's hopefully overboard in scaring, but in the end, I would
> > > >>>> like to see UABI documented so I can have a feeling of what it is for
> > > >>>> and how it was intended to be used. That's all.
> > > >>>
> > > >>> We share the same concern. If you read elsewhere in these threads you
> > > >>> will notice I have been calling this an "arms race". If the ability to
> > > >>> make yourself go faster does not required additional privilege I also
> > > >>> worry everyone will do it at which point it becomes pointless. So yes, I
> > > >>> do share this concern about exposing any of this as an unprivileged uapi.
> > > >>>
> > > >>> Is it possible to limit access to only compositors in some sane way?
> > > >>> Sounds tricky when dma-fence should be disconnected from DRM..
> > > >>
> > > >> Maybe it's not that bad in this particular case, because we are talking
> > > >> only about boosting GPU clocks which benefits everyone (except
> > > >> battery life) and it does not penalize other programs like e.g.
> > > >> job priorities do.
> > > >
> > > > Apart from efficiency that you mentioned, which does not always favor
> > > > higher clocks, sometimes thermal budget is also shared between CPU and
> > > > GPU. So more GPU clocks can mean fewer CPU clocks. It's really hard to
> > > > make optimal choices without the full coordination between both schedulers.
> > > >
> > > > But that is even not the main point, which is that if everyone sets the
> > > > immediate deadline then having the deadline API is a bit pointless. For
> > > > instance there is a reason negative nice needs CAP_SYS_ADMIN.
> > > >
> > > > However Rob has also pointed out the existence of uclamp.min via
> > > > sched_setattr which is unprivileged and can influence frequency
> > > > selection in the CPU world, so I conceded on that point. If CPU world
> > > > has accepted it so can we I guess.
> > > >
> > > > So IMO we are back to whether we can agree defining it is a hint is good
> > > > enough, be in via the name of the ioctl/flag itself or via documentation.
> > > >
> > > >> Drivers are not going to use the deadline for scheduling priorities,
> > > >> right? I don't recall seeing any mention of that.
> > > >>
> > > >> ...right?
> > > >
> > > > I wouldn't have thought it would be beneficial to preclude that, or
> > > > assume what drivers would do with the info to begin with.
> > > >
> > > > For instance in i915 we almost had a deadline based scheduler which was
> > > > much fairer than the current priority sorted fifo and in an ideal world
> > > > we would either revive or re-implement that idea. In which case
> > > > considering the fence deadline would naturally slot in and give true
> > > > integration with compositor deadlines (not just boost clocks and pray it
> > > > helps).
> > > How is user-space to decide whether to use ioctl(SET_DEADLINE) or
> > > poll(POLLPRI)?
> >
> > Implementation of blocking gl/vk/cl APIs, like glFinish() would use
> > poll(POLLPRI).  It could also set an immediate deadline and then call
> > poll() without POLLPRI.
> >
> > Other than compositors which do frame-pacing I expect the main usage
> > of either of these is mesa.
>
> Okay, so it looks like we already agreed that having a way to bump frequency
> from userspace is acceptable. either because there are already other ways
> that you can waste power and because this already acceptable in the CPU
> world.
>
> But why we are doing this in hidden ways then?
>
> Why can't we have this hint per context that is getting executed?
> (either with a boost-context flag or with some low/med/max or '-1' to '1'
> value like the latency priority)?
>
> I don't like the waitboost because this heurisitic fails in some media cases.
> I don't like the global setting because we might be alternating a top-priority
> with low-priority cases...
>
> So, why not something per context in execution?
>

It needs to be finer granularity than per-context, because not all
waits should trigger boosting.  For example, virglrenderer ends up
with a thread polling unsignaled fences to know when to signal an
interrupt to the guest virtgpu.  This alone shouldn't trigger
boosting.  (We also wouldn't want to completely disable boosting for
virglrenderer.)  Or the usermode driver could be waiting on a fence to
know when to do some cleanup.

That is not to say that there isn't room for per-context flags to
disable/enable boosting for fences created by that context, meaning it
could be an AND operation for i915 if it needs to be.

BR,
-R

Sebastian Wick Feb. 27, 2023, 10:44 p.m. UTC | #29

On Mon, Feb 27, 2023 at 11:20 PM Rob Clark <robdclark@gmail.com> wrote:
>
> On Mon, Feb 27, 2023 at 1:36 PM Rodrigo Vivi <rodrigo.vivi@intel.com> wrote:
> >
> > On Fri, Feb 24, 2023 at 09:59:57AM -0800, Rob Clark wrote:
> > > On Fri, Feb 24, 2023 at 7:27 AM Luben Tuikov <luben.tuikov@amd.com> wrote:
> > > >
> > > > On 2023-02-24 06:37, Tvrtko Ursulin wrote:
> > > > >
> > > > > On 24/02/2023 11:00, Pekka Paalanen wrote:
> > > > >> On Fri, 24 Feb 2023 10:50:51 +0000
> > > > >> Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> wrote:
> > > > >>
> > > > >>> On 24/02/2023 10:24, Pekka Paalanen wrote:
> > > > >>>> On Fri, 24 Feb 2023 09:41:46 +0000
> > > > >>>> Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> wrote:
> > > > >>>>
> > > > >>>>> On 24/02/2023 09:26, Pekka Paalanen wrote:
> > > > >>>>>> On Thu, 23 Feb 2023 10:51:48 -0800
> > > > >>>>>> Rob Clark <robdclark@gmail.com> wrote:
> > > > >>>>>>
> > > > >>>>>>> On Thu, Feb 23, 2023 at 1:38 AM Pekka Paalanen <ppaalanen@gmail.com> wrote:
> > > > >>>>>>>>
> > > > >>>>>>>> On Wed, 22 Feb 2023 07:37:26 -0800
> > > > >>>>>>>> Rob Clark <robdclark@gmail.com> wrote:
> > > > >>>>>>>>
> > > > >>>>>>>>> On Wed, Feb 22, 2023 at 1:49 AM Pekka Paalanen <ppaalanen@gmail.com> wrote:
> > > > >>>>>>
> > > > >>>>>> ...
> > > > >>>>>>
> > > > >>>>>>>>>> On another matter, if the application uses SET_DEADLINE with one
> > > > >>>>>>>>>> timestamp, and the compositor uses SET_DEADLINE on the same thing with
> > > > >>>>>>>>>> another timestamp, what should happen?
> > > > >>>>>>>>>
> > > > >>>>>>>>> The expectation is that many deadline hints can be set on a fence.
> > > > >>>>>>>>> The fence signaller should track the soonest deadline.
> > > > >>>>>>>>
> > > > >>>>>>>> You need to document that as UAPI, since it is observable to userspace.
> > > > >>>>>>>> It would be bad if drivers or subsystems would differ in behaviour.
> > > > >>>>>>>>
> > > > >>>>>>>
> > > > >>>>>>> It is in the end a hint.  It is about giving the driver more
> > > > >>>>>>> information so that it can make better choices.  But the driver is
> > > > >>>>>>> even free to ignore it.  So maybe "expectation" is too strong of a
> > > > >>>>>>> word.  Rather, any other behavior doesn't really make sense.  But it
> > > > >>>>>>> could end up being dictated by how the hw and/or fw works.
> > > > >>>>>>
> > > > >>>>>> It will stop being a hint once it has been implemented and used in the
> > > > >>>>>> wild long enough. The kernel userspace regression rules make sure of
> > > > >>>>>> that.
> > > > >>>>>
> > > > >>>>> Yeah, tricky and maybe a gray area in this case. I think we eluded
> > > > >>>>> elsewhere in the thread that renaming the thing might be an option.
> > > > >>>>>
> > > > >>>>> So maybe instead of deadline, which is a very strong word, use something
> > > > >>>>> along the lines of "present time hint", or "signalled time hint"? Maybe
> > > > >>>>> reads clumsy. Just throwing some ideas for a start.
> > > > >>>>
> > > > >>>> You can try, but I fear that if it ever changes behaviour and
> > > > >>>> someone notices that, it's labelled as a kernel regression. I don't
> > > > >>>> think documentation has ever been the authoritative definition of UABI
> > > > >>>> in Linux, it just guides drivers and userspace towards a common
> > > > >>>> understanding and common usage patterns.
> > > > >>>>
> > > > >>>> So even if the UABI contract is not documented (ugh), you need to be
> > > > >>>> prepared to set the UABI contract through kernel implementation.
> > > > >>>
> > > > >>> To be the devil's advocate it probably wouldn't be an ABI regression but
> > > > >>> just an regression. Same way as what nice(2) priorities mean hasn't
> > > > >>> always been the same over the years, I don't think there is a strict
> > > > >>> contract.
> > > > >>>
> > > > >>> Having said that, it may be different with latency sensitive stuff such
> > > > >>> as UIs though since it is very observable and can be very painful to users.
> > > > >>>
> > > > >>>> If you do not document the UABI contract, then different drivers are
> > > > >>>> likely to implement it differently, leading to differing behaviour.
> > > > >>>> Also userspace will invent wild ways to abuse the UABI if there is no
> > > > >>>> documentation guiding it on proper use. If userspace or end users
> > > > >>>> observe different behaviour, that's bad even if it's not a regression.
> > > > >>>>
> > > > >>>> I don't like the situation either, but it is what it is. UABI stability
> > > > >>>> trumps everything regardless of whether it was documented or not.
> > > > >>>>
> > > > >>>> I bet userspace is going to use this as a "make it faster, make it
> > > > >>>> hotter" button. I would not be surprised if someone wrote a LD_PRELOAD
> > > > >>>> library that stamps any and all fences with an expired deadline to
> > > > >>>> just squeeze out a little more through some weird side-effect.
> > > > >>>>
> > > > >>>> Well, that's hopefully overboard in scaring, but in the end, I would
> > > > >>>> like to see UABI documented so I can have a feeling of what it is for
> > > > >>>> and how it was intended to be used. That's all.
> > > > >>>
> > > > >>> We share the same concern. If you read elsewhere in these threads you
> > > > >>> will notice I have been calling this an "arms race". If the ability to
> > > > >>> make yourself go faster does not required additional privilege I also
> > > > >>> worry everyone will do it at which point it becomes pointless. So yes, I
> > > > >>> do share this concern about exposing any of this as an unprivileged uapi.
> > > > >>>
> > > > >>> Is it possible to limit access to only compositors in some sane way?
> > > > >>> Sounds tricky when dma-fence should be disconnected from DRM..
> > > > >>
> > > > >> Maybe it's not that bad in this particular case, because we are talking
> > > > >> only about boosting GPU clocks which benefits everyone (except
> > > > >> battery life) and it does not penalize other programs like e.g.
> > > > >> job priorities do.
> > > > >
> > > > > Apart from efficiency that you mentioned, which does not always favor
> > > > > higher clocks, sometimes thermal budget is also shared between CPU and
> > > > > GPU. So more GPU clocks can mean fewer CPU clocks. It's really hard to
> > > > > make optimal choices without the full coordination between both schedulers.
> > > > >
> > > > > But that is even not the main point, which is that if everyone sets the
> > > > > immediate deadline then having the deadline API is a bit pointless. For
> > > > > instance there is a reason negative nice needs CAP_SYS_ADMIN.
> > > > >
> > > > > However Rob has also pointed out the existence of uclamp.min via
> > > > > sched_setattr which is unprivileged and can influence frequency
> > > > > selection in the CPU world, so I conceded on that point. If CPU world
> > > > > has accepted it so can we I guess.
> > > > >
> > > > > So IMO we are back to whether we can agree defining it is a hint is good
> > > > > enough, be in via the name of the ioctl/flag itself or via documentation.
> > > > >
> > > > >> Drivers are not going to use the deadline for scheduling priorities,
> > > > >> right? I don't recall seeing any mention of that.
> > > > >>
> > > > >> ...right?
> > > > >
> > > > > I wouldn't have thought it would be beneficial to preclude that, or
> > > > > assume what drivers would do with the info to begin with.
> > > > >
> > > > > For instance in i915 we almost had a deadline based scheduler which was
> > > > > much fairer than the current priority sorted fifo and in an ideal world
> > > > > we would either revive or re-implement that idea. In which case
> > > > > considering the fence deadline would naturally slot in and give true
> > > > > integration with compositor deadlines (not just boost clocks and pray it
> > > > > helps).
> > > > How is user-space to decide whether to use ioctl(SET_DEADLINE) or
> > > > poll(POLLPRI)?
> > >
> > > Implementation of blocking gl/vk/cl APIs, like glFinish() would use
> > > poll(POLLPRI).  It could also set an immediate deadline and then call
> > > poll() without POLLPRI.
> > >
> > > Other than compositors which do frame-pacing I expect the main usage
> > > of either of these is mesa.
> >
> > Okay, so it looks like we already agreed that having a way to bump frequency
> > from userspace is acceptable. either because there are already other ways
> > that you can waste power and because this already acceptable in the CPU
> > world.
> >
> > But why we are doing this in hidden ways then?
> >
> > Why can't we have this hint per context that is getting executed?
> > (either with a boost-context flag or with some low/med/max or '-1' to '1'
> > value like the latency priority)?
> >
> > I don't like the waitboost because this heurisitic fails in some media cases.
> > I don't like the global setting because we might be alternating a top-priority
> > with low-priority cases...
> >
> > So, why not something per context in execution?
> >
>
> It needs to be finer granularity than per-context, because not all
> waits should trigger boosting.  For example, virglrenderer ends up
> with a thread polling unsignaled fences to know when to signal an
> interrupt to the guest virtgpu.  This alone shouldn't trigger
> boosting.  (We also wouldn't want to completely disable boosting for
> virglrenderer.)  Or the usermode driver could be waiting on a fence to
> know when to do some cleanup.
>
> That is not to say that there isn't room for per-context flags to
> disable/enable boosting for fences created by that context, meaning it
> could be an AND operation for i915 if it needs to be.

First of all, I believe that the fence deadline hint is a good idea.
With that being said, I also don't think it is sufficient in a lot of
cases.

The one thing I was alluding to before and that Pekka mentioned as
well is that mutter for example has a problem where we're missing the
deadline consistently because the clocks don't ramp up fast enough and
there is a MR which is just trying to keep the GPU busy to avoid this.

It would be much better if the kernel could make sure the clocks are
all ramped up when we start submitting work. In the compositor we
actually have a lot of information that *should* influence clocks. We
know when we're going to start submitting work and when the deadline
for that work is beforehand. We know which windows are visible, and
which one should have the highest priority. We know when there are
input events which actually matter. We know when the deadline for
client work is.

In the future we also want to make sure clients know beforehand when
they should start their work and when the deadline is but that's all
very much WIP in both wayland and vulkan.

There are two issues:

1. The compositor has no way to communicate any of that information to
the kernel.
2. The only connection to client work the compositor has is a fence to
the last bit of work that must be done before the deadline after a
wl_surface.commit.

So in both cases a fence is just not the right primitive for us. We
need to be able to provide per-context/queue information for work that
will happen in the future and we need a way to refer to a
context/queue generically and over IPC to boost the clocks of the
device that a client is actually using and maybe even give priority.

But like I said, having a per-fence deadline is probably still a good
idea and doesn't conflict with any of the more coarse information.

>
> BR,
> -R
>

Rob Clark Feb. 27, 2023, 11:48 p.m. UTC | #30

On Mon, Feb 27, 2023 at 2:44 PM Sebastian Wick
<sebastian.wick@redhat.com> wrote:
>
> On Mon, Feb 27, 2023 at 11:20 PM Rob Clark <robdclark@gmail.com> wrote:
> >
> > On Mon, Feb 27, 2023 at 1:36 PM Rodrigo Vivi <rodrigo.vivi@intel.com> wrote:
> > >
> > > On Fri, Feb 24, 2023 at 09:59:57AM -0800, Rob Clark wrote:
> > > > On Fri, Feb 24, 2023 at 7:27 AM Luben Tuikov <luben.tuikov@amd.com> wrote:
> > > > >
> > > > > On 2023-02-24 06:37, Tvrtko Ursulin wrote:
> > > > > >
> > > > > > On 24/02/2023 11:00, Pekka Paalanen wrote:
> > > > > >> On Fri, 24 Feb 2023 10:50:51 +0000
> > > > > >> Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> wrote:
> > > > > >>
> > > > > >>> On 24/02/2023 10:24, Pekka Paalanen wrote:
> > > > > >>>> On Fri, 24 Feb 2023 09:41:46 +0000
> > > > > >>>> Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> wrote:
> > > > > >>>>
> > > > > >>>>> On 24/02/2023 09:26, Pekka Paalanen wrote:
> > > > > >>>>>> On Thu, 23 Feb 2023 10:51:48 -0800
> > > > > >>>>>> Rob Clark <robdclark@gmail.com> wrote:
> > > > > >>>>>>
> > > > > >>>>>>> On Thu, Feb 23, 2023 at 1:38 AM Pekka Paalanen <ppaalanen@gmail.com> wrote:
> > > > > >>>>>>>>
> > > > > >>>>>>>> On Wed, 22 Feb 2023 07:37:26 -0800
> > > > > >>>>>>>> Rob Clark <robdclark@gmail.com> wrote:
> > > > > >>>>>>>>
> > > > > >>>>>>>>> On Wed, Feb 22, 2023 at 1:49 AM Pekka Paalanen <ppaalanen@gmail.com> wrote:
> > > > > >>>>>>
> > > > > >>>>>> ...
> > > > > >>>>>>
> > > > > >>>>>>>>>> On another matter, if the application uses SET_DEADLINE with one
> > > > > >>>>>>>>>> timestamp, and the compositor uses SET_DEADLINE on the same thing with
> > > > > >>>>>>>>>> another timestamp, what should happen?
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> The expectation is that many deadline hints can be set on a fence.
> > > > > >>>>>>>>> The fence signaller should track the soonest deadline.
> > > > > >>>>>>>>
> > > > > >>>>>>>> You need to document that as UAPI, since it is observable to userspace.
> > > > > >>>>>>>> It would be bad if drivers or subsystems would differ in behaviour.
> > > > > >>>>>>>>
> > > > > >>>>>>>
> > > > > >>>>>>> It is in the end a hint.  It is about giving the driver more
> > > > > >>>>>>> information so that it can make better choices.  But the driver is
> > > > > >>>>>>> even free to ignore it.  So maybe "expectation" is too strong of a
> > > > > >>>>>>> word.  Rather, any other behavior doesn't really make sense.  But it
> > > > > >>>>>>> could end up being dictated by how the hw and/or fw works.
> > > > > >>>>>>
> > > > > >>>>>> It will stop being a hint once it has been implemented and used in the
> > > > > >>>>>> wild long enough. The kernel userspace regression rules make sure of
> > > > > >>>>>> that.
> > > > > >>>>>
> > > > > >>>>> Yeah, tricky and maybe a gray area in this case. I think we eluded
> > > > > >>>>> elsewhere in the thread that renaming the thing might be an option.
> > > > > >>>>>
> > > > > >>>>> So maybe instead of deadline, which is a very strong word, use something
> > > > > >>>>> along the lines of "present time hint", or "signalled time hint"? Maybe
> > > > > >>>>> reads clumsy. Just throwing some ideas for a start.
> > > > > >>>>
> > > > > >>>> You can try, but I fear that if it ever changes behaviour and
> > > > > >>>> someone notices that, it's labelled as a kernel regression. I don't
> > > > > >>>> think documentation has ever been the authoritative definition of UABI
> > > > > >>>> in Linux, it just guides drivers and userspace towards a common
> > > > > >>>> understanding and common usage patterns.
> > > > > >>>>
> > > > > >>>> So even if the UABI contract is not documented (ugh), you need to be
> > > > > >>>> prepared to set the UABI contract through kernel implementation.
> > > > > >>>
> > > > > >>> To be the devil's advocate it probably wouldn't be an ABI regression but
> > > > > >>> just an regression. Same way as what nice(2) priorities mean hasn't
> > > > > >>> always been the same over the years, I don't think there is a strict
> > > > > >>> contract.
> > > > > >>>
> > > > > >>> Having said that, it may be different with latency sensitive stuff such
> > > > > >>> as UIs though since it is very observable and can be very painful to users.
> > > > > >>>
> > > > > >>>> If you do not document the UABI contract, then different drivers are
> > > > > >>>> likely to implement it differently, leading to differing behaviour.
> > > > > >>>> Also userspace will invent wild ways to abuse the UABI if there is no
> > > > > >>>> documentation guiding it on proper use. If userspace or end users
> > > > > >>>> observe different behaviour, that's bad even if it's not a regression.
> > > > > >>>>
> > > > > >>>> I don't like the situation either, but it is what it is. UABI stability
> > > > > >>>> trumps everything regardless of whether it was documented or not.
> > > > > >>>>
> > > > > >>>> I bet userspace is going to use this as a "make it faster, make it
> > > > > >>>> hotter" button. I would not be surprised if someone wrote a LD_PRELOAD
> > > > > >>>> library that stamps any and all fences with an expired deadline to
> > > > > >>>> just squeeze out a little more through some weird side-effect.
> > > > > >>>>
> > > > > >>>> Well, that's hopefully overboard in scaring, but in the end, I would
> > > > > >>>> like to see UABI documented so I can have a feeling of what it is for
> > > > > >>>> and how it was intended to be used. That's all.
> > > > > >>>
> > > > > >>> We share the same concern. If you read elsewhere in these threads you
> > > > > >>> will notice I have been calling this an "arms race". If the ability to
> > > > > >>> make yourself go faster does not required additional privilege I also
> > > > > >>> worry everyone will do it at which point it becomes pointless. So yes, I
> > > > > >>> do share this concern about exposing any of this as an unprivileged uapi.
> > > > > >>>
> > > > > >>> Is it possible to limit access to only compositors in some sane way?
> > > > > >>> Sounds tricky when dma-fence should be disconnected from DRM..
> > > > > >>
> > > > > >> Maybe it's not that bad in this particular case, because we are talking
> > > > > >> only about boosting GPU clocks which benefits everyone (except
> > > > > >> battery life) and it does not penalize other programs like e.g.
> > > > > >> job priorities do.
> > > > > >
> > > > > > Apart from efficiency that you mentioned, which does not always favor
> > > > > > higher clocks, sometimes thermal budget is also shared between CPU and
> > > > > > GPU. So more GPU clocks can mean fewer CPU clocks. It's really hard to
> > > > > > make optimal choices without the full coordination between both schedulers.
> > > > > >
> > > > > > But that is even not the main point, which is that if everyone sets the
> > > > > > immediate deadline then having the deadline API is a bit pointless. For
> > > > > > instance there is a reason negative nice needs CAP_SYS_ADMIN.
> > > > > >
> > > > > > However Rob has also pointed out the existence of uclamp.min via
> > > > > > sched_setattr which is unprivileged and can influence frequency
> > > > > > selection in the CPU world, so I conceded on that point. If CPU world
> > > > > > has accepted it so can we I guess.
> > > > > >
> > > > > > So IMO we are back to whether we can agree defining it is a hint is good
> > > > > > enough, be in via the name of the ioctl/flag itself or via documentation.
> > > > > >
> > > > > >> Drivers are not going to use the deadline for scheduling priorities,
> > > > > >> right? I don't recall seeing any mention of that.
> > > > > >>
> > > > > >> ...right?
> > > > > >
> > > > > > I wouldn't have thought it would be beneficial to preclude that, or
> > > > > > assume what drivers would do with the info to begin with.
> > > > > >
> > > > > > For instance in i915 we almost had a deadline based scheduler which was
> > > > > > much fairer than the current priority sorted fifo and in an ideal world
> > > > > > we would either revive or re-implement that idea. In which case
> > > > > > considering the fence deadline would naturally slot in and give true
> > > > > > integration with compositor deadlines (not just boost clocks and pray it
> > > > > > helps).
> > > > > How is user-space to decide whether to use ioctl(SET_DEADLINE) or
> > > > > poll(POLLPRI)?
> > > >
> > > > Implementation of blocking gl/vk/cl APIs, like glFinish() would use
> > > > poll(POLLPRI).  It could also set an immediate deadline and then call
> > > > poll() without POLLPRI.
> > > >
> > > > Other than compositors which do frame-pacing I expect the main usage
> > > > of either of these is mesa.
> > >
> > > Okay, so it looks like we already agreed that having a way to bump frequency
> > > from userspace is acceptable. either because there are already other ways
> > > that you can waste power and because this already acceptable in the CPU
> > > world.
> > >
> > > But why we are doing this in hidden ways then?
> > >
> > > Why can't we have this hint per context that is getting executed?
> > > (either with a boost-context flag or with some low/med/max or '-1' to '1'
> > > value like the latency priority)?
> > >
> > > I don't like the waitboost because this heurisitic fails in some media cases.
> > > I don't like the global setting because we might be alternating a top-priority
> > > with low-priority cases...
> > >
> > > So, why not something per context in execution?
> > >
> >
> > It needs to be finer granularity than per-context, because not all
> > waits should trigger boosting.  For example, virglrenderer ends up
> > with a thread polling unsignaled fences to know when to signal an
> > interrupt to the guest virtgpu.  This alone shouldn't trigger
> > boosting.  (We also wouldn't want to completely disable boosting for
> > virglrenderer.)  Or the usermode driver could be waiting on a fence to
> > know when to do some cleanup.
> >
> > That is not to say that there isn't room for per-context flags to
> > disable/enable boosting for fences created by that context, meaning it
> > could be an AND operation for i915 if it needs to be.
>
> First of all, I believe that the fence deadline hint is a good idea.
> With that being said, I also don't think it is sufficient in a lot of
> cases.
>
> The one thing I was alluding to before and that Pekka mentioned as
> well is that mutter for example has a problem where we're missing the
> deadline consistently because the clocks don't ramp up fast enough and
> there is a MR which is just trying to keep the GPU busy to avoid this.

the dynamic double/triple buffer thing?

> It would be much better if the kernel could make sure the clocks are
> all ramped up when we start submitting work. In the compositor we
> actually have a lot of information that *should* influence clocks. We
> know when we're going to start submitting work and when the deadline
> for that work is beforehand. We know which windows are visible, and
> which one should have the highest priority.

This sounds like something orthogonal.. something for cgroups?  Ie.
android moves visible/foreground apps to a different cgroup to given
them higher priority.  Tvrtko had a patchset to add drm cgroup
support..

> We know when there are
> input events which actually matter.

This I see input as a different boost source for the driver.  (Ie. one
boost signal is missing fence deadlines, another is input events,
etc.)

We end up using downstream input-handlers on the kernel side for this.
Partially for the freq boost (but mostly not, UI interactive workloads
like touchscreen scrolling don't generally need high GPU freqs, they
are more memory bandwidth limited if they are limited by anything)..
really the reason here is to get a head-start on the ~2ms that it
takes to power up the GPU if it is suspended.

But this is not quite perfect, since for example some keys should be
handled on key-down but others on key-up.

But again, this is something different from fence deadlines.  I'm
interested in proposals because we do need something for this.  But I
think it is something is orthogonal to this series.  For input, we
want the kernel to know long before userspace is ready to submit
rendering.

> We know when the deadline for
> client work is.
>
> In the future we also want to make sure clients know beforehand when
> they should start their work and when the deadline is but that's all
> very much WIP in both wayland and vulkan.
>
> There are two issues:
>
> 1. The compositor has no way to communicate any of that information to
> the kernel.
> 2. The only connection to client work the compositor has is a fence to
> the last bit of work that must be done before the deadline after a
> wl_surface.commit.

If the client isn't using multiple GPUs, a single fence should be
sufficient.  And even if it is, well we still have all the dependency
information on the kernel side.  Ie. drm/sched knows what fences it is
waiting on if it is waiting to schedule the work associated with the
last fence.  It would otherwise require drm/sched to be a bit more
tricky than it is so far in this series.

But I think the normal dual-gpu case, the app is only dealing with a single GPU?

> So in both cases a fence is just not the right primitive for us. We
> need to be able to provide per-context/queue information for work that
> will happen in the future and we need a way to refer to a
> context/queue generically and over IPC to boost the clocks of the
> device that a client is actually using and maybe even give priority.
>
> But like I said, having a per-fence deadline is probably still a good
> idea and doesn't conflict with any of the more coarse information.

Yeah, I think the thing is you need multiple things, and this is only
one of them ;-)

BR,
-R

Sebastian Wick Feb. 28, 2023, 2:30 p.m. UTC | #31

On Tue, Feb 28, 2023 at 12:48 AM Rob Clark <robdclark@gmail.com> wrote:
>
> On Mon, Feb 27, 2023 at 2:44 PM Sebastian Wick
> <sebastian.wick@redhat.com> wrote:
> >
> > On Mon, Feb 27, 2023 at 11:20 PM Rob Clark <robdclark@gmail.com> wrote:
> > >
> > > On Mon, Feb 27, 2023 at 1:36 PM Rodrigo Vivi <rodrigo.vivi@intel.com> wrote:
> > > >
> > > > On Fri, Feb 24, 2023 at 09:59:57AM -0800, Rob Clark wrote:
> > > > > On Fri, Feb 24, 2023 at 7:27 AM Luben Tuikov <luben.tuikov@amd.com> wrote:
> > > > > >
> > > > > > On 2023-02-24 06:37, Tvrtko Ursulin wrote:
> > > > > > >
> > > > > > > On 24/02/2023 11:00, Pekka Paalanen wrote:
> > > > > > >> On Fri, 24 Feb 2023 10:50:51 +0000
> > > > > > >> Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> wrote:
> > > > > > >>
> > > > > > >>> On 24/02/2023 10:24, Pekka Paalanen wrote:
> > > > > > >>>> On Fri, 24 Feb 2023 09:41:46 +0000
> > > > > > >>>> Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> wrote:
> > > > > > >>>>
> > > > > > >>>>> On 24/02/2023 09:26, Pekka Paalanen wrote:
> > > > > > >>>>>> On Thu, 23 Feb 2023 10:51:48 -0800
> > > > > > >>>>>> Rob Clark <robdclark@gmail.com> wrote:
> > > > > > >>>>>>
> > > > > > >>>>>>> On Thu, Feb 23, 2023 at 1:38 AM Pekka Paalanen <ppaalanen@gmail.com> wrote:
> > > > > > >>>>>>>>
> > > > > > >>>>>>>> On Wed, 22 Feb 2023 07:37:26 -0800
> > > > > > >>>>>>>> Rob Clark <robdclark@gmail.com> wrote:
> > > > > > >>>>>>>>
> > > > > > >>>>>>>>> On Wed, Feb 22, 2023 at 1:49 AM Pekka Paalanen <ppaalanen@gmail.com> wrote:
> > > > > > >>>>>>
> > > > > > >>>>>> ...
> > > > > > >>>>>>
> > > > > > >>>>>>>>>> On another matter, if the application uses SET_DEADLINE with one
> > > > > > >>>>>>>>>> timestamp, and the compositor uses SET_DEADLINE on the same thing with
> > > > > > >>>>>>>>>> another timestamp, what should happen?
> > > > > > >>>>>>>>>
> > > > > > >>>>>>>>> The expectation is that many deadline hints can be set on a fence.
> > > > > > >>>>>>>>> The fence signaller should track the soonest deadline.
> > > > > > >>>>>>>>
> > > > > > >>>>>>>> You need to document that as UAPI, since it is observable to userspace.
> > > > > > >>>>>>>> It would be bad if drivers or subsystems would differ in behaviour.
> > > > > > >>>>>>>>
> > > > > > >>>>>>>
> > > > > > >>>>>>> It is in the end a hint.  It is about giving the driver more
> > > > > > >>>>>>> information so that it can make better choices.  But the driver is
> > > > > > >>>>>>> even free to ignore it.  So maybe "expectation" is too strong of a
> > > > > > >>>>>>> word.  Rather, any other behavior doesn't really make sense.  But it
> > > > > > >>>>>>> could end up being dictated by how the hw and/or fw works.
> > > > > > >>>>>>
> > > > > > >>>>>> It will stop being a hint once it has been implemented and used in the
> > > > > > >>>>>> wild long enough. The kernel userspace regression rules make sure of
> > > > > > >>>>>> that.
> > > > > > >>>>>
> > > > > > >>>>> Yeah, tricky and maybe a gray area in this case. I think we eluded
> > > > > > >>>>> elsewhere in the thread that renaming the thing might be an option.
> > > > > > >>>>>
> > > > > > >>>>> So maybe instead of deadline, which is a very strong word, use something
> > > > > > >>>>> along the lines of "present time hint", or "signalled time hint"? Maybe
> > > > > > >>>>> reads clumsy. Just throwing some ideas for a start.
> > > > > > >>>>
> > > > > > >>>> You can try, but I fear that if it ever changes behaviour and
> > > > > > >>>> someone notices that, it's labelled as a kernel regression. I don't
> > > > > > >>>> think documentation has ever been the authoritative definition of UABI
> > > > > > >>>> in Linux, it just guides drivers and userspace towards a common
> > > > > > >>>> understanding and common usage patterns.
> > > > > > >>>>
> > > > > > >>>> So even if the UABI contract is not documented (ugh), you need to be
> > > > > > >>>> prepared to set the UABI contract through kernel implementation.
> > > > > > >>>
> > > > > > >>> To be the devil's advocate it probably wouldn't be an ABI regression but
> > > > > > >>> just an regression. Same way as what nice(2) priorities mean hasn't
> > > > > > >>> always been the same over the years, I don't think there is a strict
> > > > > > >>> contract.
> > > > > > >>>
> > > > > > >>> Having said that, it may be different with latency sensitive stuff such
> > > > > > >>> as UIs though since it is very observable and can be very painful to users.
> > > > > > >>>
> > > > > > >>>> If you do not document the UABI contract, then different drivers are
> > > > > > >>>> likely to implement it differently, leading to differing behaviour.
> > > > > > >>>> Also userspace will invent wild ways to abuse the UABI if there is no
> > > > > > >>>> documentation guiding it on proper use. If userspace or end users
> > > > > > >>>> observe different behaviour, that's bad even if it's not a regression.
> > > > > > >>>>
> > > > > > >>>> I don't like the situation either, but it is what it is. UABI stability
> > > > > > >>>> trumps everything regardless of whether it was documented or not.
> > > > > > >>>>
> > > > > > >>>> I bet userspace is going to use this as a "make it faster, make it
> > > > > > >>>> hotter" button. I would not be surprised if someone wrote a LD_PRELOAD
> > > > > > >>>> library that stamps any and all fences with an expired deadline to
> > > > > > >>>> just squeeze out a little more through some weird side-effect.
> > > > > > >>>>
> > > > > > >>>> Well, that's hopefully overboard in scaring, but in the end, I would
> > > > > > >>>> like to see UABI documented so I can have a feeling of what it is for
> > > > > > >>>> and how it was intended to be used. That's all.
> > > > > > >>>
> > > > > > >>> We share the same concern. If you read elsewhere in these threads you
> > > > > > >>> will notice I have been calling this an "arms race". If the ability to
> > > > > > >>> make yourself go faster does not required additional privilege I also
> > > > > > >>> worry everyone will do it at which point it becomes pointless. So yes, I
> > > > > > >>> do share this concern about exposing any of this as an unprivileged uapi.
> > > > > > >>>
> > > > > > >>> Is it possible to limit access to only compositors in some sane way?
> > > > > > >>> Sounds tricky when dma-fence should be disconnected from DRM..
> > > > > > >>
> > > > > > >> Maybe it's not that bad in this particular case, because we are talking
> > > > > > >> only about boosting GPU clocks which benefits everyone (except
> > > > > > >> battery life) and it does not penalize other programs like e.g.
> > > > > > >> job priorities do.
> > > > > > >
> > > > > > > Apart from efficiency that you mentioned, which does not always favor
> > > > > > > higher clocks, sometimes thermal budget is also shared between CPU and
> > > > > > > GPU. So more GPU clocks can mean fewer CPU clocks. It's really hard to
> > > > > > > make optimal choices without the full coordination between both schedulers.
> > > > > > >
> > > > > > > But that is even not the main point, which is that if everyone sets the
> > > > > > > immediate deadline then having the deadline API is a bit pointless. For
> > > > > > > instance there is a reason negative nice needs CAP_SYS_ADMIN.
> > > > > > >
> > > > > > > However Rob has also pointed out the existence of uclamp.min via
> > > > > > > sched_setattr which is unprivileged and can influence frequency
> > > > > > > selection in the CPU world, so I conceded on that point. If CPU world
> > > > > > > has accepted it so can we I guess.
> > > > > > >
> > > > > > > So IMO we are back to whether we can agree defining it is a hint is good
> > > > > > > enough, be in via the name of the ioctl/flag itself or via documentation.
> > > > > > >
> > > > > > >> Drivers are not going to use the deadline for scheduling priorities,
> > > > > > >> right? I don't recall seeing any mention of that.
> > > > > > >>
> > > > > > >> ...right?
> > > > > > >
> > > > > > > I wouldn't have thought it would be beneficial to preclude that, or
> > > > > > > assume what drivers would do with the info to begin with.
> > > > > > >
> > > > > > > For instance in i915 we almost had a deadline based scheduler which was
> > > > > > > much fairer than the current priority sorted fifo and in an ideal world
> > > > > > > we would either revive or re-implement that idea. In which case
> > > > > > > considering the fence deadline would naturally slot in and give true
> > > > > > > integration with compositor deadlines (not just boost clocks and pray it
> > > > > > > helps).
> > > > > > How is user-space to decide whether to use ioctl(SET_DEADLINE) or
> > > > > > poll(POLLPRI)?
> > > > >
> > > > > Implementation of blocking gl/vk/cl APIs, like glFinish() would use
> > > > > poll(POLLPRI).  It could also set an immediate deadline and then call
> > > > > poll() without POLLPRI.
> > > > >
> > > > > Other than compositors which do frame-pacing I expect the main usage
> > > > > of either of these is mesa.
> > > >
> > > > Okay, so it looks like we already agreed that having a way to bump frequency
> > > > from userspace is acceptable. either because there are already other ways
> > > > that you can waste power and because this already acceptable in the CPU
> > > > world.
> > > >
> > > > But why we are doing this in hidden ways then?
> > > >
> > > > Why can't we have this hint per context that is getting executed?
> > > > (either with a boost-context flag or with some low/med/max or '-1' to '1'
> > > > value like the latency priority)?
> > > >
> > > > I don't like the waitboost because this heurisitic fails in some media cases.
> > > > I don't like the global setting because we might be alternating a top-priority
> > > > with low-priority cases...
> > > >
> > > > So, why not something per context in execution?
> > > >
> > >
> > > It needs to be finer granularity than per-context, because not all
> > > waits should trigger boosting.  For example, virglrenderer ends up
> > > with a thread polling unsignaled fences to know when to signal an
> > > interrupt to the guest virtgpu.  This alone shouldn't trigger
> > > boosting.  (We also wouldn't want to completely disable boosting for
> > > virglrenderer.)  Or the usermode driver could be waiting on a fence to
> > > know when to do some cleanup.
> > >
> > > That is not to say that there isn't room for per-context flags to
> > > disable/enable boosting for fences created by that context, meaning it
> > > could be an AND operation for i915 if it needs to be.
> >
> > First of all, I believe that the fence deadline hint is a good idea.
> > With that being said, I also don't think it is sufficient in a lot of
> > cases.
> >
> > The one thing I was alluding to before and that Pekka mentioned as
> > well is that mutter for example has a problem where we're missing the
> > deadline consistently because the clocks don't ramp up fast enough and
> > there is a MR which is just trying to keep the GPU busy to avoid this.
>
> the dynamic double/triple buffer thing?

Yes

> > It would be much better if the kernel could make sure the clocks are
> > all ramped up when we start submitting work. In the compositor we
> > actually have a lot of information that *should* influence clocks. We
> > know when we're going to start submitting work and when the deadline
> > for that work is beforehand. We know which windows are visible, and
> > which one should have the highest priority.
>
> This sounds like something orthogonal.. something for cgroups?  Ie.
> android moves visible/foreground apps to a different cgroup to given
> them higher priority.  Tvrtko had a patchset to add drm cgroup
> support..

For the priority stuff, yes, probably. The visibility information on
the other hand could be used to determine if we want to ramp up the
GPU in the first place.

> > We know when there are
> > input events which actually matter.
>
> This I see input as a different boost source for the driver.  (Ie. one
> boost signal is missing fence deadlines, another is input events,
> etc.)
>
> We end up using downstream input-handlers on the kernel side for this.
> Partially for the freq boost (but mostly not, UI interactive workloads
> like touchscreen scrolling don't generally need high GPU freqs, they
> are more memory bandwidth limited if they are limited by anything)..
> really the reason here is to get a head-start on the ~2ms that it
> takes to power up the GPU if it is suspended.

Right, but one of my main points I want to make here is that we could
get the head-start not only in response to input events but also for
the GPU work the compositor submits and in the future also to GPU work
that clients commit. Except that we don't have a way to tell the
kernel about it.

> But this is not quite perfect, since for example some keys should be
> handled on key-down but others on key-up.
>
> But again, this is something different from fence deadlines.  I'm
> interested in proposals because we do need something for this.  But I
> think it is something is orthogonal to this series.  For input, we
> want the kernel to know long before userspace is ready to submit
> rendering.

We can do that in the compositor! Input events are really not
something you should care about in the kernel. Input itself is also
not the only indication of incoming animated content. Some network
packets arriving could equally well result in the same situation.

> > We know when the deadline for
> > client work is.
> >
> > In the future we also want to make sure clients know beforehand when
> > they should start their work and when the deadline is but that's all
> > very much WIP in both wayland and vulkan.
> >
> > There are two issues:
> >
> > 1. The compositor has no way to communicate any of that information to
> > the kernel.
> > 2. The only connection to client work the compositor has is a fence to
> > the last bit of work that must be done before the deadline after a
> > wl_surface.commit.
>
> If the client isn't using multiple GPUs, a single fence should be
> sufficient.  And even if it is, well we still have all the dependency
> information on the kernel side.  Ie. drm/sched knows what fences it is
> waiting on if it is waiting to schedule the work associated with the
> last fence.  It would otherwise require drm/sched to be a bit more
> tricky than it is so far in this series.
>
> But I think the normal dual-gpu case, the app is only dealing with a single GPU?

We generally don't know which GPU a client uses though. We know which
one we're using and tell the client that the buffer should be
compatible with it but that's the extent of information we have, until
we get a fence but that fence usually gets to the compositor pretty
late. Way too late for the compositor to tell the kernel to ramp up
the GPU and still have an impact.

It also seems like we're moving away from tracking execution
dependencies with fences when we're switching to user mode fences.

> > So in both cases a fence is just not the right primitive for us. We
> > need to be able to provide per-context/queue information for work that
> > will happen in the future and we need a way to refer to a
> > context/queue generically and over IPC to boost the clocks of the
> > device that a client is actually using and maybe even give priority.
> >
> > But like I said, having a per-fence deadline is probably still a good
> > idea and doesn't conflict with any of the more coarse information.
>
> Yeah, I think the thing is you need multiple things, and this is only
> one of them ;-)
>
> BR,
> -R
>

Rob Clark Feb. 28, 2023, 10:52 p.m. UTC | #32

On Tue, Feb 28, 2023 at 6:30 AM Sebastian Wick
<sebastian.wick@redhat.com> wrote:
>
> On Tue, Feb 28, 2023 at 12:48 AM Rob Clark <robdclark@gmail.com> wrote:
> >
> > On Mon, Feb 27, 2023 at 2:44 PM Sebastian Wick
> > <sebastian.wick@redhat.com> wrote:
> > >
> > > On Mon, Feb 27, 2023 at 11:20 PM Rob Clark <robdclark@gmail.com> wrote:
> > > >
> > > > On Mon, Feb 27, 2023 at 1:36 PM Rodrigo Vivi <rodrigo.vivi@intel.com> wrote:
> > > > >
> > > > > On Fri, Feb 24, 2023 at 09:59:57AM -0800, Rob Clark wrote:
> > > > > > On Fri, Feb 24, 2023 at 7:27 AM Luben Tuikov <luben.tuikov@amd.com> wrote:
> > > > > > >
> > > > > > > On 2023-02-24 06:37, Tvrtko Ursulin wrote:
> > > > > > > >
> > > > > > > > On 24/02/2023 11:00, Pekka Paalanen wrote:
> > > > > > > >> On Fri, 24 Feb 2023 10:50:51 +0000
> > > > > > > >> Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> wrote:
> > > > > > > >>
> > > > > > > >>> On 24/02/2023 10:24, Pekka Paalanen wrote:
> > > > > > > >>>> On Fri, 24 Feb 2023 09:41:46 +0000
> > > > > > > >>>> Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> wrote:
> > > > > > > >>>>
> > > > > > > >>>>> On 24/02/2023 09:26, Pekka Paalanen wrote:
> > > > > > > >>>>>> On Thu, 23 Feb 2023 10:51:48 -0800
> > > > > > > >>>>>> Rob Clark <robdclark@gmail.com> wrote:
> > > > > > > >>>>>>
> > > > > > > >>>>>>> On Thu, Feb 23, 2023 at 1:38 AM Pekka Paalanen <ppaalanen@gmail.com> wrote:
> > > > > > > >>>>>>>>
> > > > > > > >>>>>>>> On Wed, 22 Feb 2023 07:37:26 -0800
> > > > > > > >>>>>>>> Rob Clark <robdclark@gmail.com> wrote:
> > > > > > > >>>>>>>>
> > > > > > > >>>>>>>>> On Wed, Feb 22, 2023 at 1:49 AM Pekka Paalanen <ppaalanen@gmail.com> wrote:
> > > > > > > >>>>>>
> > > > > > > >>>>>> ...
> > > > > > > >>>>>>
> > > > > > > >>>>>>>>>> On another matter, if the application uses SET_DEADLINE with one
> > > > > > > >>>>>>>>>> timestamp, and the compositor uses SET_DEADLINE on the same thing with
> > > > > > > >>>>>>>>>> another timestamp, what should happen?
> > > > > > > >>>>>>>>>
> > > > > > > >>>>>>>>> The expectation is that many deadline hints can be set on a fence.
> > > > > > > >>>>>>>>> The fence signaller should track the soonest deadline.
> > > > > > > >>>>>>>>
> > > > > > > >>>>>>>> You need to document that as UAPI, since it is observable to userspace.
> > > > > > > >>>>>>>> It would be bad if drivers or subsystems would differ in behaviour.
> > > > > > > >>>>>>>>
> > > > > > > >>>>>>>
> > > > > > > >>>>>>> It is in the end a hint.  It is about giving the driver more
> > > > > > > >>>>>>> information so that it can make better choices.  But the driver is
> > > > > > > >>>>>>> even free to ignore it.  So maybe "expectation" is too strong of a
> > > > > > > >>>>>>> word.  Rather, any other behavior doesn't really make sense.  But it
> > > > > > > >>>>>>> could end up being dictated by how the hw and/or fw works.
> > > > > > > >>>>>>
> > > > > > > >>>>>> It will stop being a hint once it has been implemented and used in the
> > > > > > > >>>>>> wild long enough. The kernel userspace regression rules make sure of
> > > > > > > >>>>>> that.
> > > > > > > >>>>>
> > > > > > > >>>>> Yeah, tricky and maybe a gray area in this case. I think we eluded
> > > > > > > >>>>> elsewhere in the thread that renaming the thing might be an option.
> > > > > > > >>>>>
> > > > > > > >>>>> So maybe instead of deadline, which is a very strong word, use something
> > > > > > > >>>>> along the lines of "present time hint", or "signalled time hint"? Maybe
> > > > > > > >>>>> reads clumsy. Just throwing some ideas for a start.
> > > > > > > >>>>
> > > > > > > >>>> You can try, but I fear that if it ever changes behaviour and
> > > > > > > >>>> someone notices that, it's labelled as a kernel regression. I don't
> > > > > > > >>>> think documentation has ever been the authoritative definition of UABI
> > > > > > > >>>> in Linux, it just guides drivers and userspace towards a common
> > > > > > > >>>> understanding and common usage patterns.
> > > > > > > >>>>
> > > > > > > >>>> So even if the UABI contract is not documented (ugh), you need to be
> > > > > > > >>>> prepared to set the UABI contract through kernel implementation.
> > > > > > > >>>
> > > > > > > >>> To be the devil's advocate it probably wouldn't be an ABI regression but
> > > > > > > >>> just an regression. Same way as what nice(2) priorities mean hasn't
> > > > > > > >>> always been the same over the years, I don't think there is a strict
> > > > > > > >>> contract.
> > > > > > > >>>
> > > > > > > >>> Having said that, it may be different with latency sensitive stuff such
> > > > > > > >>> as UIs though since it is very observable and can be very painful to users.
> > > > > > > >>>
> > > > > > > >>>> If you do not document the UABI contract, then different drivers are
> > > > > > > >>>> likely to implement it differently, leading to differing behaviour.
> > > > > > > >>>> Also userspace will invent wild ways to abuse the UABI if there is no
> > > > > > > >>>> documentation guiding it on proper use. If userspace or end users
> > > > > > > >>>> observe different behaviour, that's bad even if it's not a regression.
> > > > > > > >>>>
> > > > > > > >>>> I don't like the situation either, but it is what it is. UABI stability
> > > > > > > >>>> trumps everything regardless of whether it was documented or not.
> > > > > > > >>>>
> > > > > > > >>>> I bet userspace is going to use this as a "make it faster, make it
> > > > > > > >>>> hotter" button. I would not be surprised if someone wrote a LD_PRELOAD
> > > > > > > >>>> library that stamps any and all fences with an expired deadline to
> > > > > > > >>>> just squeeze out a little more through some weird side-effect.
> > > > > > > >>>>
> > > > > > > >>>> Well, that's hopefully overboard in scaring, but in the end, I would
> > > > > > > >>>> like to see UABI documented so I can have a feeling of what it is for
> > > > > > > >>>> and how it was intended to be used. That's all.
> > > > > > > >>>
> > > > > > > >>> We share the same concern. If you read elsewhere in these threads you
> > > > > > > >>> will notice I have been calling this an "arms race". If the ability to
> > > > > > > >>> make yourself go faster does not required additional privilege I also
> > > > > > > >>> worry everyone will do it at which point it becomes pointless. So yes, I
> > > > > > > >>> do share this concern about exposing any of this as an unprivileged uapi.
> > > > > > > >>>
> > > > > > > >>> Is it possible to limit access to only compositors in some sane way?
> > > > > > > >>> Sounds tricky when dma-fence should be disconnected from DRM..
> > > > > > > >>
> > > > > > > >> Maybe it's not that bad in this particular case, because we are talking
> > > > > > > >> only about boosting GPU clocks which benefits everyone (except
> > > > > > > >> battery life) and it does not penalize other programs like e.g.
> > > > > > > >> job priorities do.
> > > > > > > >
> > > > > > > > Apart from efficiency that you mentioned, which does not always favor
> > > > > > > > higher clocks, sometimes thermal budget is also shared between CPU and
> > > > > > > > GPU. So more GPU clocks can mean fewer CPU clocks. It's really hard to
> > > > > > > > make optimal choices without the full coordination between both schedulers.
> > > > > > > >
> > > > > > > > But that is even not the main point, which is that if everyone sets the
> > > > > > > > immediate deadline then having the deadline API is a bit pointless. For
> > > > > > > > instance there is a reason negative nice needs CAP_SYS_ADMIN.
> > > > > > > >
> > > > > > > > However Rob has also pointed out the existence of uclamp.min via
> > > > > > > > sched_setattr which is unprivileged and can influence frequency
> > > > > > > > selection in the CPU world, so I conceded on that point. If CPU world
> > > > > > > > has accepted it so can we I guess.
> > > > > > > >
> > > > > > > > So IMO we are back to whether we can agree defining it is a hint is good
> > > > > > > > enough, be in via the name of the ioctl/flag itself or via documentation.
> > > > > > > >
> > > > > > > >> Drivers are not going to use the deadline for scheduling priorities,
> > > > > > > >> right? I don't recall seeing any mention of that.
> > > > > > > >>
> > > > > > > >> ...right?
> > > > > > > >
> > > > > > > > I wouldn't have thought it would be beneficial to preclude that, or
> > > > > > > > assume what drivers would do with the info to begin with.
> > > > > > > >
> > > > > > > > For instance in i915 we almost had a deadline based scheduler which was
> > > > > > > > much fairer than the current priority sorted fifo and in an ideal world
> > > > > > > > we would either revive or re-implement that idea. In which case
> > > > > > > > considering the fence deadline would naturally slot in and give true
> > > > > > > > integration with compositor deadlines (not just boost clocks and pray it
> > > > > > > > helps).
> > > > > > > How is user-space to decide whether to use ioctl(SET_DEADLINE) or
> > > > > > > poll(POLLPRI)?
> > > > > >
> > > > > > Implementation of blocking gl/vk/cl APIs, like glFinish() would use
> > > > > > poll(POLLPRI).  It could also set an immediate deadline and then call
> > > > > > poll() without POLLPRI.
> > > > > >
> > > > > > Other than compositors which do frame-pacing I expect the main usage
> > > > > > of either of these is mesa.
> > > > >
> > > > > Okay, so it looks like we already agreed that having a way to bump frequency
> > > > > from userspace is acceptable. either because there are already other ways
> > > > > that you can waste power and because this already acceptable in the CPU
> > > > > world.
> > > > >
> > > > > But why we are doing this in hidden ways then?
> > > > >
> > > > > Why can't we have this hint per context that is getting executed?
> > > > > (either with a boost-context flag or with some low/med/max or '-1' to '1'
> > > > > value like the latency priority)?
> > > > >
> > > > > I don't like the waitboost because this heurisitic fails in some media cases.
> > > > > I don't like the global setting because we might be alternating a top-priority
> > > > > with low-priority cases...
> > > > >
> > > > > So, why not something per context in execution?
> > > > >
> > > >
> > > > It needs to be finer granularity than per-context, because not all
> > > > waits should trigger boosting.  For example, virglrenderer ends up
> > > > with a thread polling unsignaled fences to know when to signal an
> > > > interrupt to the guest virtgpu.  This alone shouldn't trigger
> > > > boosting.  (We also wouldn't want to completely disable boosting for
> > > > virglrenderer.)  Or the usermode driver could be waiting on a fence to
> > > > know when to do some cleanup.
> > > >
> > > > That is not to say that there isn't room for per-context flags to
> > > > disable/enable boosting for fences created by that context, meaning it
> > > > could be an AND operation for i915 if it needs to be.
> > >
> > > First of all, I believe that the fence deadline hint is a good idea.
> > > With that being said, I also don't think it is sufficient in a lot of
> > > cases.
> > >
> > > The one thing I was alluding to before and that Pekka mentioned as
> > > well is that mutter for example has a problem where we're missing the
> > > deadline consistently because the clocks don't ramp up fast enough and
> > > there is a MR which is just trying to keep the GPU busy to avoid this.
> >
> > the dynamic double/triple buffer thing?
>
> Yes
>
> > > It would be much better if the kernel could make sure the clocks are
> > > all ramped up when we start submitting work. In the compositor we
> > > actually have a lot of information that *should* influence clocks. We
> > > know when we're going to start submitting work and when the deadline
> > > for that work is beforehand. We know which windows are visible, and
> > > which one should have the highest priority.
> >
> > This sounds like something orthogonal.. something for cgroups?  Ie.
> > android moves visible/foreground apps to a different cgroup to given
> > them higher priority.  Tvrtko had a patchset to add drm cgroup
> > support..
>
> For the priority stuff, yes, probably. The visibility information on
> the other hand could be used to determine if we want to ramp up the
> GPU in the first place.

Right, but I think that we could have multiple cgroup based knobs, one
that adjusts priority and one that limits/disables deadline based
boost?  This way the compositor could setup different policies for
visible vs hidden apps influencing both how much time they get on the
GPU and boost.

> > > We know when there are
> > > input events which actually matter.
> >
> > This I see input as a different boost source for the driver.  (Ie. one
> > boost signal is missing fence deadlines, another is input events,
> > etc.)
> >
> > We end up using downstream input-handlers on the kernel side for this.
> > Partially for the freq boost (but mostly not, UI interactive workloads
> > like touchscreen scrolling don't generally need high GPU freqs, they
> > are more memory bandwidth limited if they are limited by anything)..
> > really the reason here is to get a head-start on the ~2ms that it
> > takes to power up the GPU if it is suspended.
>
> Right, but one of my main points I want to make here is that we could
> get the head-start not only in response to input events but also for
> the GPU work the compositor submits and in the future also to GPU work
> that clients commit. Except that we don't have a way to tell the
> kernel about it.
>
> > But this is not quite perfect, since for example some keys should be
> > handled on key-down but others on key-up.
> >
> > But again, this is something different from fence deadlines.  I'm
> > interested in proposals because we do need something for this.  But I
> > think it is something is orthogonal to this series.  For input, we
> > want the kernel to know long before userspace is ready to submit
> > rendering.
>
> We can do that in the compositor! Input events are really not
> something you should care about in the kernel. Input itself is also
> not the only indication of incoming animated content. Some network
> packets arriving could equally well result in the same situation.

We do use input boost not just for GPU freq, but also for CPU freq.
It seems like triggering it from the kernel could happen somewhat
sooner than userspace.  (But just speculation.)

I'm not sure network packets count.  Or at least compared to a touch
interface, the user won't perceive the lag nearly as much, compared to
whether or not the UI tracks their fingertips.

> > > We know when the deadline for
> > > client work is.
> > >
> > > In the future we also want to make sure clients know beforehand when
> > > they should start their work and when the deadline is but that's all
> > > very much WIP in both wayland and vulkan.
> > >
> > > There are two issues:
> > >
> > > 1. The compositor has no way to communicate any of that information to
> > > the kernel.
> > > 2. The only connection to client work the compositor has is a fence to
> > > the last bit of work that must be done before the deadline after a
> > > wl_surface.commit.
> >
> > If the client isn't using multiple GPUs, a single fence should be
> > sufficient.  And even if it is, well we still have all the dependency
> > information on the kernel side.  Ie. drm/sched knows what fences it is
> > waiting on if it is waiting to schedule the work associated with the
> > last fence.  It would otherwise require drm/sched to be a bit more
> > tricky than it is so far in this series.
> >
> > But I think the normal dual-gpu case, the app is only dealing with a single GPU?
>
> We generally don't know which GPU a client uses though. We know which
> one we're using and tell the client that the buffer should be
> compatible with it but that's the extent of information we have, until
> we get a fence but that fence usually gets to the compositor pretty
> late. Way too late for the compositor to tell the kernel to ramp up
> the GPU and still have an impact.

Are you sure about that?  I'd have expected the app to hand off
fence+buffer to the compositor pretty quickly after rendering is
submitted.  This is what I've seen elsewhere.

> It also seems like we're moving away from tracking execution
> dependencies with fences when we're switching to user mode fences.

I suppose there are two cases..

1) dependent fences all from the same device.. no prob, the right
driver gets the signal that it needs to clk up, and figures out the
rest on it's own

2) dependent fences from different devices.. I suppose if device B is
waiting for a fence from device A before it can make forward progress,
why not express this as a deadline hint on A's fence?  (But by the
time you have this problem, you are probably not really caring about
power consumption, so meh..)

BR,
-R

> > > So in both cases a fence is just not the right primitive for us. We
> > > need to be able to provide per-context/queue information for work that
> > > will happen in the future and we need a way to refer to a
> > > context/queue generically and over IPC to boost the clocks of the
> > > device that a client is actually using and maybe even give priority.
> > >
> > > But like I said, having a per-fence deadline is probably still a good
> > > idea and doesn't conflict with any of the more coarse information.
> >
> > Yeah, I think the thing is you need multiple things, and this is only
> > one of them ;-)
> >
> > BR,
> > -R
> >
>

Sebastian Wick March 1, 2023, 3:31 p.m. UTC | #33

On Tue, Feb 28, 2023 at 11:52 PM Rob Clark <robdclark@gmail.com> wrote:
>
> On Tue, Feb 28, 2023 at 6:30 AM Sebastian Wick
> <sebastian.wick@redhat.com> wrote:
> >
> > On Tue, Feb 28, 2023 at 12:48 AM Rob Clark <robdclark@gmail.com> wrote:
> > >
> > > On Mon, Feb 27, 2023 at 2:44 PM Sebastian Wick
> > > <sebastian.wick@redhat.com> wrote:
> > > >
> > > > On Mon, Feb 27, 2023 at 11:20 PM Rob Clark <robdclark@gmail.com> wrote:
> > > > >
> > > > > On Mon, Feb 27, 2023 at 1:36 PM Rodrigo Vivi <rodrigo.vivi@intel.com> wrote:
> > > > > >
> > > > > > On Fri, Feb 24, 2023 at 09:59:57AM -0800, Rob Clark wrote:
> > > > > > > On Fri, Feb 24, 2023 at 7:27 AM Luben Tuikov <luben.tuikov@amd.com> wrote:
> > > > > > > >
> > > > > > > > On 2023-02-24 06:37, Tvrtko Ursulin wrote:
> > > > > > > > >
> > > > > > > > > On 24/02/2023 11:00, Pekka Paalanen wrote:
> > > > > > > > >> On Fri, 24 Feb 2023 10:50:51 +0000
> > > > > > > > >> Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> wrote:
> > > > > > > > >>
> > > > > > > > >>> On 24/02/2023 10:24, Pekka Paalanen wrote:
> > > > > > > > >>>> On Fri, 24 Feb 2023 09:41:46 +0000
> > > > > > > > >>>> Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> wrote:
> > > > > > > > >>>>
> > > > > > > > >>>>> On 24/02/2023 09:26, Pekka Paalanen wrote:
> > > > > > > > >>>>>> On Thu, 23 Feb 2023 10:51:48 -0800
> > > > > > > > >>>>>> Rob Clark <robdclark@gmail.com> wrote:
> > > > > > > > >>>>>>
> > > > > > > > >>>>>>> On Thu, Feb 23, 2023 at 1:38 AM Pekka Paalanen <ppaalanen@gmail.com> wrote:
> > > > > > > > >>>>>>>>
> > > > > > > > >>>>>>>> On Wed, 22 Feb 2023 07:37:26 -0800
> > > > > > > > >>>>>>>> Rob Clark <robdclark@gmail.com> wrote:
> > > > > > > > >>>>>>>>
> > > > > > > > >>>>>>>>> On Wed, Feb 22, 2023 at 1:49 AM Pekka Paalanen <ppaalanen@gmail.com> wrote:
> > > > > > > > >>>>>>
> > > > > > > > >>>>>> ...
> > > > > > > > >>>>>>
> > > > > > > > >>>>>>>>>> On another matter, if the application uses SET_DEADLINE with one
> > > > > > > > >>>>>>>>>> timestamp, and the compositor uses SET_DEADLINE on the same thing with
> > > > > > > > >>>>>>>>>> another timestamp, what should happen?
> > > > > > > > >>>>>>>>>
> > > > > > > > >>>>>>>>> The expectation is that many deadline hints can be set on a fence.
> > > > > > > > >>>>>>>>> The fence signaller should track the soonest deadline.
> > > > > > > > >>>>>>>>
> > > > > > > > >>>>>>>> You need to document that as UAPI, since it is observable to userspace.
> > > > > > > > >>>>>>>> It would be bad if drivers or subsystems would differ in behaviour.
> > > > > > > > >>>>>>>>
> > > > > > > > >>>>>>>
> > > > > > > > >>>>>>> It is in the end a hint.  It is about giving the driver more
> > > > > > > > >>>>>>> information so that it can make better choices.  But the driver is
> > > > > > > > >>>>>>> even free to ignore it.  So maybe "expectation" is too strong of a
> > > > > > > > >>>>>>> word.  Rather, any other behavior doesn't really make sense.  But it
> > > > > > > > >>>>>>> could end up being dictated by how the hw and/or fw works.
> > > > > > > > >>>>>>
> > > > > > > > >>>>>> It will stop being a hint once it has been implemented and used in the
> > > > > > > > >>>>>> wild long enough. The kernel userspace regression rules make sure of
> > > > > > > > >>>>>> that.
> > > > > > > > >>>>>
> > > > > > > > >>>>> Yeah, tricky and maybe a gray area in this case. I think we eluded
> > > > > > > > >>>>> elsewhere in the thread that renaming the thing might be an option.
> > > > > > > > >>>>>
> > > > > > > > >>>>> So maybe instead of deadline, which is a very strong word, use something
> > > > > > > > >>>>> along the lines of "present time hint", or "signalled time hint"? Maybe
> > > > > > > > >>>>> reads clumsy. Just throwing some ideas for a start.
> > > > > > > > >>>>
> > > > > > > > >>>> You can try, but I fear that if it ever changes behaviour and
> > > > > > > > >>>> someone notices that, it's labelled as a kernel regression. I don't
> > > > > > > > >>>> think documentation has ever been the authoritative definition of UABI
> > > > > > > > >>>> in Linux, it just guides drivers and userspace towards a common
> > > > > > > > >>>> understanding and common usage patterns.
> > > > > > > > >>>>
> > > > > > > > >>>> So even if the UABI contract is not documented (ugh), you need to be
> > > > > > > > >>>> prepared to set the UABI contract through kernel implementation.
> > > > > > > > >>>
> > > > > > > > >>> To be the devil's advocate it probably wouldn't be an ABI regression but
> > > > > > > > >>> just an regression. Same way as what nice(2) priorities mean hasn't
> > > > > > > > >>> always been the same over the years, I don't think there is a strict
> > > > > > > > >>> contract.
> > > > > > > > >>>
> > > > > > > > >>> Having said that, it may be different with latency sensitive stuff such
> > > > > > > > >>> as UIs though since it is very observable and can be very painful to users.
> > > > > > > > >>>
> > > > > > > > >>>> If you do not document the UABI contract, then different drivers are
> > > > > > > > >>>> likely to implement it differently, leading to differing behaviour.
> > > > > > > > >>>> Also userspace will invent wild ways to abuse the UABI if there is no
> > > > > > > > >>>> documentation guiding it on proper use. If userspace or end users
> > > > > > > > >>>> observe different behaviour, that's bad even if it's not a regression.
> > > > > > > > >>>>
> > > > > > > > >>>> I don't like the situation either, but it is what it is. UABI stability
> > > > > > > > >>>> trumps everything regardless of whether it was documented or not.
> > > > > > > > >>>>
> > > > > > > > >>>> I bet userspace is going to use this as a "make it faster, make it
> > > > > > > > >>>> hotter" button. I would not be surprised if someone wrote a LD_PRELOAD
> > > > > > > > >>>> library that stamps any and all fences with an expired deadline to
> > > > > > > > >>>> just squeeze out a little more through some weird side-effect.
> > > > > > > > >>>>
> > > > > > > > >>>> Well, that's hopefully overboard in scaring, but in the end, I would
> > > > > > > > >>>> like to see UABI documented so I can have a feeling of what it is for
> > > > > > > > >>>> and how it was intended to be used. That's all.
> > > > > > > > >>>
> > > > > > > > >>> We share the same concern. If you read elsewhere in these threads you
> > > > > > > > >>> will notice I have been calling this an "arms race". If the ability to
> > > > > > > > >>> make yourself go faster does not required additional privilege I also
> > > > > > > > >>> worry everyone will do it at which point it becomes pointless. So yes, I
> > > > > > > > >>> do share this concern about exposing any of this as an unprivileged uapi.
> > > > > > > > >>>
> > > > > > > > >>> Is it possible to limit access to only compositors in some sane way?
> > > > > > > > >>> Sounds tricky when dma-fence should be disconnected from DRM..
> > > > > > > > >>
> > > > > > > > >> Maybe it's not that bad in this particular case, because we are talking
> > > > > > > > >> only about boosting GPU clocks which benefits everyone (except
> > > > > > > > >> battery life) and it does not penalize other programs like e.g.
> > > > > > > > >> job priorities do.
> > > > > > > > >
> > > > > > > > > Apart from efficiency that you mentioned, which does not always favor
> > > > > > > > > higher clocks, sometimes thermal budget is also shared between CPU and
> > > > > > > > > GPU. So more GPU clocks can mean fewer CPU clocks. It's really hard to
> > > > > > > > > make optimal choices without the full coordination between both schedulers.
> > > > > > > > >
> > > > > > > > > But that is even not the main point, which is that if everyone sets the
> > > > > > > > > immediate deadline then having the deadline API is a bit pointless. For
> > > > > > > > > instance there is a reason negative nice needs CAP_SYS_ADMIN.
> > > > > > > > >
> > > > > > > > > However Rob has also pointed out the existence of uclamp.min via
> > > > > > > > > sched_setattr which is unprivileged and can influence frequency
> > > > > > > > > selection in the CPU world, so I conceded on that point. If CPU world
> > > > > > > > > has accepted it so can we I guess.
> > > > > > > > >
> > > > > > > > > So IMO we are back to whether we can agree defining it is a hint is good
> > > > > > > > > enough, be in via the name of the ioctl/flag itself or via documentation.
> > > > > > > > >
> > > > > > > > >> Drivers are not going to use the deadline for scheduling priorities,
> > > > > > > > >> right? I don't recall seeing any mention of that.
> > > > > > > > >>
> > > > > > > > >> ...right?
> > > > > > > > >
> > > > > > > > > I wouldn't have thought it would be beneficial to preclude that, or
> > > > > > > > > assume what drivers would do with the info to begin with.
> > > > > > > > >
> > > > > > > > > For instance in i915 we almost had a deadline based scheduler which was
> > > > > > > > > much fairer than the current priority sorted fifo and in an ideal world
> > > > > > > > > we would either revive or re-implement that idea. In which case
> > > > > > > > > considering the fence deadline would naturally slot in and give true
> > > > > > > > > integration with compositor deadlines (not just boost clocks and pray it
> > > > > > > > > helps).
> > > > > > > > How is user-space to decide whether to use ioctl(SET_DEADLINE) or
> > > > > > > > poll(POLLPRI)?
> > > > > > >
> > > > > > > Implementation of blocking gl/vk/cl APIs, like glFinish() would use
> > > > > > > poll(POLLPRI).  It could also set an immediate deadline and then call
> > > > > > > poll() without POLLPRI.
> > > > > > >
> > > > > > > Other than compositors which do frame-pacing I expect the main usage
> > > > > > > of either of these is mesa.
> > > > > >
> > > > > > Okay, so it looks like we already agreed that having a way to bump frequency
> > > > > > from userspace is acceptable. either because there are already other ways
> > > > > > that you can waste power and because this already acceptable in the CPU
> > > > > > world.
> > > > > >
> > > > > > But why we are doing this in hidden ways then?
> > > > > >
> > > > > > Why can't we have this hint per context that is getting executed?
> > > > > > (either with a boost-context flag or with some low/med/max or '-1' to '1'
> > > > > > value like the latency priority)?
> > > > > >
> > > > > > I don't like the waitboost because this heurisitic fails in some media cases.
> > > > > > I don't like the global setting because we might be alternating a top-priority
> > > > > > with low-priority cases...
> > > > > >
> > > > > > So, why not something per context in execution?
> > > > > >
> > > > >
> > > > > It needs to be finer granularity than per-context, because not all
> > > > > waits should trigger boosting.  For example, virglrenderer ends up
> > > > > with a thread polling unsignaled fences to know when to signal an
> > > > > interrupt to the guest virtgpu.  This alone shouldn't trigger
> > > > > boosting.  (We also wouldn't want to completely disable boosting for
> > > > > virglrenderer.)  Or the usermode driver could be waiting on a fence to
> > > > > know when to do some cleanup.
> > > > >
> > > > > That is not to say that there isn't room for per-context flags to
> > > > > disable/enable boosting for fences created by that context, meaning it
> > > > > could be an AND operation for i915 if it needs to be.
> > > >
> > > > First of all, I believe that the fence deadline hint is a good idea.
> > > > With that being said, I also don't think it is sufficient in a lot of
> > > > cases.
> > > >
> > > > The one thing I was alluding to before and that Pekka mentioned as
> > > > well is that mutter for example has a problem where we're missing the
> > > > deadline consistently because the clocks don't ramp up fast enough and
> > > > there is a MR which is just trying to keep the GPU busy to avoid this.
> > >
> > > the dynamic double/triple buffer thing?
> >
> > Yes
> >
> > > > It would be much better if the kernel could make sure the clocks are
> > > > all ramped up when we start submitting work. In the compositor we
> > > > actually have a lot of information that *should* influence clocks. We
> > > > know when we're going to start submitting work and when the deadline
> > > > for that work is beforehand. We know which windows are visible, and
> > > > which one should have the highest priority.
> > >
> > > This sounds like something orthogonal.. something for cgroups?  Ie.
> > > android moves visible/foreground apps to a different cgroup to given
> > > them higher priority.  Tvrtko had a patchset to add drm cgroup
> > > support..
> >
> > For the priority stuff, yes, probably. The visibility information on
> > the other hand could be used to determine if we want to ramp up the
> > GPU in the first place.
>
> Right, but I think that we could have multiple cgroup based knobs, one
> that adjusts priority and one that limits/disables deadline based
> boost?  This way the compositor could setup different policies for
> visible vs hidden apps influencing both how much time they get on the
> GPU and boost.

I'm not sure if a negative control really makes sense. There are
probably timing sensitive use cases where the result doesn't show up
on the local screen and penalizing them when they are not focused or
hidden might also not be the best idea.

> > > > We know when there are
> > > > input events which actually matter.
> > >
> > > This I see input as a different boost source for the driver.  (Ie. one
> > > boost signal is missing fence deadlines, another is input events,
> > > etc.)
> > >
> > > We end up using downstream input-handlers on the kernel side for this.
> > > Partially for the freq boost (but mostly not, UI interactive workloads
> > > like touchscreen scrolling don't generally need high GPU freqs, they
> > > are more memory bandwidth limited if they are limited by anything)..
> > > really the reason here is to get a head-start on the ~2ms that it
> > > takes to power up the GPU if it is suspended.
> >
> > Right, but one of my main points I want to make here is that we could
> > get the head-start not only in response to input events but also for
> > the GPU work the compositor submits and in the future also to GPU work
> > that clients commit. Except that we don't have a way to tell the
> > kernel about it.
> >
> > > But this is not quite perfect, since for example some keys should be
> > > handled on key-down but others on key-up.
> > >
> > > But again, this is something different from fence deadlines.  I'm
> > > interested in proposals because we do need something for this.  But I
> > > think it is something is orthogonal to this series.  For input, we
> > > want the kernel to know long before userspace is ready to submit
> > > rendering.
> >
> > We can do that in the compositor! Input events are really not
> > something you should care about in the kernel. Input itself is also
> > not the only indication of incoming animated content. Some network
> > packets arriving could equally well result in the same situation.
>
> We do use input boost not just for GPU freq, but also for CPU freq.
> It seems like triggering it from the kernel could happen somewhat
> sooner than userspace.  (But just speculation.)

Technically it has to be sooner but I doubt it makes any difference.
Getting a lot of false-positives on the other hand does make a
difference.

> I'm not sure network packets count.  Or at least compared to a touch
> interface, the user won't perceive the lag nearly as much, compared to
> whether or not the UI tracks their fingertips.

Sure, stutter in interactive cases is especially awful but stutter is
awful in general. If we can solve it in all cases we should do so.

> > > > We know when the deadline for
> > > > client work is.
> > > >
> > > > In the future we also want to make sure clients know beforehand when
> > > > they should start their work and when the deadline is but that's all
> > > > very much WIP in both wayland and vulkan.
> > > >
> > > > There are two issues:
> > > >
> > > > 1. The compositor has no way to communicate any of that information to
> > > > the kernel.
> > > > 2. The only connection to client work the compositor has is a fence to
> > > > the last bit of work that must be done before the deadline after a
> > > > wl_surface.commit.
> > >
> > > If the client isn't using multiple GPUs, a single fence should be
> > > sufficient.  And even if it is, well we still have all the dependency
> > > information on the kernel side.  Ie. drm/sched knows what fences it is
> > > waiting on if it is waiting to schedule the work associated with the
> > > last fence.  It would otherwise require drm/sched to be a bit more
> > > tricky than it is so far in this series.
> > >
> > > But I think the normal dual-gpu case, the app is only dealing with a single GPU?
> >
> > We generally don't know which GPU a client uses though. We know which
> > one we're using and tell the client that the buffer should be
> > compatible with it but that's the extent of information we have, until
> > we get a fence but that fence usually gets to the compositor pretty
> > late. Way too late for the compositor to tell the kernel to ramp up
> > the GPU and still have an impact.
>
> Are you sure about that?  I'd have expected the app to hand off
> fence+buffer to the compositor pretty quickly after rendering is
> submitted.  This is what I've seen elsewhere.

After rendering is submitted it is already too late if the GPU takes
2ms to wake up. And if there is no rendering submitted there is no
fence and if there is no fence it has no deadline. If we want to solve
that we also need some kind of 'work will be submitted to the queue
starting from this time' hint.

> > It also seems like we're moving away from tracking execution
> > dependencies with fences when we're switching to user mode fences.
>
> I suppose there are two cases..
>
> 1) dependent fences all from the same device.. no prob, the right
> driver gets the signal that it needs to clk up, and figures out the
> rest on it's own

AFAIU with user mode fences it's impossible for the kernel to figure
out what work depends on it and it might never signal. The whole
deadline on fences thing breaks down with user mode fences.

> 2) dependent fences from different devices.. I suppose if device B is
> waiting for a fence from device A before it can make forward progress,
> why not express this as a deadline hint on A's fence?  (But by the
> time you have this problem, you are probably not really caring about
> power consumption, so meh..)
>
> BR,
> -R
>
> > > > So in both cases a fence is just not the right primitive for us. We
> > > > need to be able to provide per-context/queue information for work that
> > > > will happen in the future and we need a way to refer to a
> > > > context/queue generically and over IPC to boost the clocks of the
> > > > device that a client is actually using and maybe even give priority.
> > > >
> > > > But like I said, having a per-fence deadline is probably still a good
> > > > idea and doesn't conflict with any of the more coarse information.
> > >
> > > Yeah, I think the thing is you need multiple things, and this is only
> > > one of them ;-)
> > >
> > > BR,
> > > -R
> > >
> >
>

Rodrigo Vivi March 1, 2023, 3:45 p.m. UTC | #34

On Mon, Feb 27, 2023 at 02:20:04PM -0800, Rob Clark wrote:
> On Mon, Feb 27, 2023 at 1:36 PM Rodrigo Vivi <rodrigo.vivi@intel.com> wrote:
> >
> > On Fri, Feb 24, 2023 at 09:59:57AM -0800, Rob Clark wrote:
> > > On Fri, Feb 24, 2023 at 7:27 AM Luben Tuikov <luben.tuikov@amd.com> wrote:
> > > >
> > > > On 2023-02-24 06:37, Tvrtko Ursulin wrote:
> > > > >
> > > > > On 24/02/2023 11:00, Pekka Paalanen wrote:
> > > > >> On Fri, 24 Feb 2023 10:50:51 +0000
> > > > >> Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> wrote:
> > > > >>
> > > > >>> On 24/02/2023 10:24, Pekka Paalanen wrote:
> > > > >>>> On Fri, 24 Feb 2023 09:41:46 +0000
> > > > >>>> Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> wrote:
> > > > >>>>
> > > > >>>>> On 24/02/2023 09:26, Pekka Paalanen wrote:
> > > > >>>>>> On Thu, 23 Feb 2023 10:51:48 -0800
> > > > >>>>>> Rob Clark <robdclark@gmail.com> wrote:
> > > > >>>>>>
> > > > >>>>>>> On Thu, Feb 23, 2023 at 1:38 AM Pekka Paalanen <ppaalanen@gmail.com> wrote:
> > > > >>>>>>>>
> > > > >>>>>>>> On Wed, 22 Feb 2023 07:37:26 -0800
> > > > >>>>>>>> Rob Clark <robdclark@gmail.com> wrote:
> > > > >>>>>>>>
> > > > >>>>>>>>> On Wed, Feb 22, 2023 at 1:49 AM Pekka Paalanen <ppaalanen@gmail.com> wrote:
> > > > >>>>>>
> > > > >>>>>> ...
> > > > >>>>>>
> > > > >>>>>>>>>> On another matter, if the application uses SET_DEADLINE with one
> > > > >>>>>>>>>> timestamp, and the compositor uses SET_DEADLINE on the same thing with
> > > > >>>>>>>>>> another timestamp, what should happen?
> > > > >>>>>>>>>
> > > > >>>>>>>>> The expectation is that many deadline hints can be set on a fence.
> > > > >>>>>>>>> The fence signaller should track the soonest deadline.
> > > > >>>>>>>>
> > > > >>>>>>>> You need to document that as UAPI, since it is observable to userspace.
> > > > >>>>>>>> It would be bad if drivers or subsystems would differ in behaviour.
> > > > >>>>>>>>
> > > > >>>>>>>
> > > > >>>>>>> It is in the end a hint.  It is about giving the driver more
> > > > >>>>>>> information so that it can make better choices.  But the driver is
> > > > >>>>>>> even free to ignore it.  So maybe "expectation" is too strong of a
> > > > >>>>>>> word.  Rather, any other behavior doesn't really make sense.  But it
> > > > >>>>>>> could end up being dictated by how the hw and/or fw works.
> > > > >>>>>>
> > > > >>>>>> It will stop being a hint once it has been implemented and used in the
> > > > >>>>>> wild long enough. The kernel userspace regression rules make sure of
> > > > >>>>>> that.
> > > > >>>>>
> > > > >>>>> Yeah, tricky and maybe a gray area in this case. I think we eluded
> > > > >>>>> elsewhere in the thread that renaming the thing might be an option.
> > > > >>>>>
> > > > >>>>> So maybe instead of deadline, which is a very strong word, use something
> > > > >>>>> along the lines of "present time hint", or "signalled time hint"? Maybe
> > > > >>>>> reads clumsy. Just throwing some ideas for a start.
> > > > >>>>
> > > > >>>> You can try, but I fear that if it ever changes behaviour and
> > > > >>>> someone notices that, it's labelled as a kernel regression. I don't
> > > > >>>> think documentation has ever been the authoritative definition of UABI
> > > > >>>> in Linux, it just guides drivers and userspace towards a common
> > > > >>>> understanding and common usage patterns.
> > > > >>>>
> > > > >>>> So even if the UABI contract is not documented (ugh), you need to be
> > > > >>>> prepared to set the UABI contract through kernel implementation.
> > > > >>>
> > > > >>> To be the devil's advocate it probably wouldn't be an ABI regression but
> > > > >>> just an regression. Same way as what nice(2) priorities mean hasn't
> > > > >>> always been the same over the years, I don't think there is a strict
> > > > >>> contract.
> > > > >>>
> > > > >>> Having said that, it may be different with latency sensitive stuff such
> > > > >>> as UIs though since it is very observable and can be very painful to users.
> > > > >>>
> > > > >>>> If you do not document the UABI contract, then different drivers are
> > > > >>>> likely to implement it differently, leading to differing behaviour.
> > > > >>>> Also userspace will invent wild ways to abuse the UABI if there is no
> > > > >>>> documentation guiding it on proper use. If userspace or end users
> > > > >>>> observe different behaviour, that's bad even if it's not a regression.
> > > > >>>>
> > > > >>>> I don't like the situation either, but it is what it is. UABI stability
> > > > >>>> trumps everything regardless of whether it was documented or not.
> > > > >>>>
> > > > >>>> I bet userspace is going to use this as a "make it faster, make it
> > > > >>>> hotter" button. I would not be surprised if someone wrote a LD_PRELOAD
> > > > >>>> library that stamps any and all fences with an expired deadline to
> > > > >>>> just squeeze out a little more through some weird side-effect.
> > > > >>>>
> > > > >>>> Well, that's hopefully overboard in scaring, but in the end, I would
> > > > >>>> like to see UABI documented so I can have a feeling of what it is for
> > > > >>>> and how it was intended to be used. That's all.
> > > > >>>
> > > > >>> We share the same concern. If you read elsewhere in these threads you
> > > > >>> will notice I have been calling this an "arms race". If the ability to
> > > > >>> make yourself go faster does not required additional privilege I also
> > > > >>> worry everyone will do it at which point it becomes pointless. So yes, I
> > > > >>> do share this concern about exposing any of this as an unprivileged uapi.
> > > > >>>
> > > > >>> Is it possible to limit access to only compositors in some sane way?
> > > > >>> Sounds tricky when dma-fence should be disconnected from DRM..
> > > > >>
> > > > >> Maybe it's not that bad in this particular case, because we are talking
> > > > >> only about boosting GPU clocks which benefits everyone (except
> > > > >> battery life) and it does not penalize other programs like e.g.
> > > > >> job priorities do.
> > > > >
> > > > > Apart from efficiency that you mentioned, which does not always favor
> > > > > higher clocks, sometimes thermal budget is also shared between CPU and
> > > > > GPU. So more GPU clocks can mean fewer CPU clocks. It's really hard to
> > > > > make optimal choices without the full coordination between both schedulers.
> > > > >
> > > > > But that is even not the main point, which is that if everyone sets the
> > > > > immediate deadline then having the deadline API is a bit pointless. For
> > > > > instance there is a reason negative nice needs CAP_SYS_ADMIN.
> > > > >
> > > > > However Rob has also pointed out the existence of uclamp.min via
> > > > > sched_setattr which is unprivileged and can influence frequency
> > > > > selection in the CPU world, so I conceded on that point. If CPU world
> > > > > has accepted it so can we I guess.
> > > > >
> > > > > So IMO we are back to whether we can agree defining it is a hint is good
> > > > > enough, be in via the name of the ioctl/flag itself or via documentation.
> > > > >
> > > > >> Drivers are not going to use the deadline for scheduling priorities,
> > > > >> right? I don't recall seeing any mention of that.
> > > > >>
> > > > >> ...right?
> > > > >
> > > > > I wouldn't have thought it would be beneficial to preclude that, or
> > > > > assume what drivers would do with the info to begin with.
> > > > >
> > > > > For instance in i915 we almost had a deadline based scheduler which was
> > > > > much fairer than the current priority sorted fifo and in an ideal world
> > > > > we would either revive or re-implement that idea. In which case
> > > > > considering the fence deadline would naturally slot in and give true
> > > > > integration with compositor deadlines (not just boost clocks and pray it
> > > > > helps).
> > > > How is user-space to decide whether to use ioctl(SET_DEADLINE) or
> > > > poll(POLLPRI)?
> > >
> > > Implementation of blocking gl/vk/cl APIs, like glFinish() would use
> > > poll(POLLPRI).  It could also set an immediate deadline and then call
> > > poll() without POLLPRI.
> > >
> > > Other than compositors which do frame-pacing I expect the main usage
> > > of either of these is mesa.
> >
> > Okay, so it looks like we already agreed that having a way to bump frequency
> > from userspace is acceptable. either because there are already other ways
> > that you can waste power and because this already acceptable in the CPU
> > world.
> >
> > But why we are doing this in hidden ways then?
> >
> > Why can't we have this hint per context that is getting executed?
> > (either with a boost-context flag or with some low/med/max or '-1' to '1'
> > value like the latency priority)?
> >
> > I don't like the waitboost because this heurisitic fails in some media cases.
> > I don't like the global setting because we might be alternating a top-priority
> > with low-priority cases...
> >
> > So, why not something per context in execution?
> >
> 
> It needs to be finer granularity than per-context, because not all
> waits should trigger boosting.  For example, virglrenderer ends up
> with a thread polling unsignaled fences to know when to signal an
> interrupt to the guest virtgpu.  This alone shouldn't trigger
> boosting.  (We also wouldn't want to completely disable boosting for
> virglrenderer.)  Or the usermode driver could be waiting on a fence to
> know when to do some cleanup.
> 
> That is not to say that there isn't room for per-context flags to
> disable/enable boosting for fences created by that context, meaning it
> could be an AND operation for i915 if it needs to be.

Right. It can be both ways I agree.

> 
> BR,
> -R

Rob Clark March 1, 2023, 4:02 p.m. UTC | #35

On Wed, Mar 1, 2023 at 7:31 AM Sebastian Wick <sebastian.wick@redhat.com> wrote:
>
> On Tue, Feb 28, 2023 at 11:52 PM Rob Clark <robdclark@gmail.com> wrote:
> >
> > On Tue, Feb 28, 2023 at 6:30 AM Sebastian Wick
> > <sebastian.wick@redhat.com> wrote:
> > >
> > > On Tue, Feb 28, 2023 at 12:48 AM Rob Clark <robdclark@gmail.com> wrote:
> > > >
> > > > On Mon, Feb 27, 2023 at 2:44 PM Sebastian Wick
> > > > <sebastian.wick@redhat.com> wrote:
> > > > >
> > > > > On Mon, Feb 27, 2023 at 11:20 PM Rob Clark <robdclark@gmail.com> wrote:
> > > > > >
> > > > > > On Mon, Feb 27, 2023 at 1:36 PM Rodrigo Vivi <rodrigo.vivi@intel.com> wrote:
> > > > > > >
> > > > > > > On Fri, Feb 24, 2023 at 09:59:57AM -0800, Rob Clark wrote:
> > > > > > > > On Fri, Feb 24, 2023 at 7:27 AM Luben Tuikov <luben.tuikov@amd.com> wrote:
> > > > > > > > >
> > > > > > > > > On 2023-02-24 06:37, Tvrtko Ursulin wrote:
> > > > > > > > > >
> > > > > > > > > > On 24/02/2023 11:00, Pekka Paalanen wrote:
> > > > > > > > > >> On Fri, 24 Feb 2023 10:50:51 +0000
> > > > > > > > > >> Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> wrote:
> > > > > > > > > >>
> > > > > > > > > >>> On 24/02/2023 10:24, Pekka Paalanen wrote:
> > > > > > > > > >>>> On Fri, 24 Feb 2023 09:41:46 +0000
> > > > > > > > > >>>> Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> wrote:
> > > > > > > > > >>>>
> > > > > > > > > >>>>> On 24/02/2023 09:26, Pekka Paalanen wrote:
> > > > > > > > > >>>>>> On Thu, 23 Feb 2023 10:51:48 -0800
> > > > > > > > > >>>>>> Rob Clark <robdclark@gmail.com> wrote:
> > > > > > > > > >>>>>>
> > > > > > > > > >>>>>>> On Thu, Feb 23, 2023 at 1:38 AM Pekka Paalanen <ppaalanen@gmail.com> wrote:
> > > > > > > > > >>>>>>>>
> > > > > > > > > >>>>>>>> On Wed, 22 Feb 2023 07:37:26 -0800
> > > > > > > > > >>>>>>>> Rob Clark <robdclark@gmail.com> wrote:
> > > > > > > > > >>>>>>>>
> > > > > > > > > >>>>>>>>> On Wed, Feb 22, 2023 at 1:49 AM Pekka Paalanen <ppaalanen@gmail.com> wrote:
> > > > > > > > > >>>>>>
> > > > > > > > > >>>>>> ...
> > > > > > > > > >>>>>>
> > > > > > > > > >>>>>>>>>> On another matter, if the application uses SET_DEADLINE with one
> > > > > > > > > >>>>>>>>>> timestamp, and the compositor uses SET_DEADLINE on the same thing with
> > > > > > > > > >>>>>>>>>> another timestamp, what should happen?
> > > > > > > > > >>>>>>>>>
> > > > > > > > > >>>>>>>>> The expectation is that many deadline hints can be set on a fence.
> > > > > > > > > >>>>>>>>> The fence signaller should track the soonest deadline.
> > > > > > > > > >>>>>>>>
> > > > > > > > > >>>>>>>> You need to document that as UAPI, since it is observable to userspace.
> > > > > > > > > >>>>>>>> It would be bad if drivers or subsystems would differ in behaviour.
> > > > > > > > > >>>>>>>>
> > > > > > > > > >>>>>>>
> > > > > > > > > >>>>>>> It is in the end a hint.  It is about giving the driver more
> > > > > > > > > >>>>>>> information so that it can make better choices.  But the driver is
> > > > > > > > > >>>>>>> even free to ignore it.  So maybe "expectation" is too strong of a
> > > > > > > > > >>>>>>> word.  Rather, any other behavior doesn't really make sense.  But it
> > > > > > > > > >>>>>>> could end up being dictated by how the hw and/or fw works.
> > > > > > > > > >>>>>>
> > > > > > > > > >>>>>> It will stop being a hint once it has been implemented and used in the
> > > > > > > > > >>>>>> wild long enough. The kernel userspace regression rules make sure of
> > > > > > > > > >>>>>> that.
> > > > > > > > > >>>>>
> > > > > > > > > >>>>> Yeah, tricky and maybe a gray area in this case. I think we eluded
> > > > > > > > > >>>>> elsewhere in the thread that renaming the thing might be an option.
> > > > > > > > > >>>>>
> > > > > > > > > >>>>> So maybe instead of deadline, which is a very strong word, use something
> > > > > > > > > >>>>> along the lines of "present time hint", or "signalled time hint"? Maybe
> > > > > > > > > >>>>> reads clumsy. Just throwing some ideas for a start.
> > > > > > > > > >>>>
> > > > > > > > > >>>> You can try, but I fear that if it ever changes behaviour and
> > > > > > > > > >>>> someone notices that, it's labelled as a kernel regression. I don't
> > > > > > > > > >>>> think documentation has ever been the authoritative definition of UABI
> > > > > > > > > >>>> in Linux, it just guides drivers and userspace towards a common
> > > > > > > > > >>>> understanding and common usage patterns.
> > > > > > > > > >>>>
> > > > > > > > > >>>> So even if the UABI contract is not documented (ugh), you need to be
> > > > > > > > > >>>> prepared to set the UABI contract through kernel implementation.
> > > > > > > > > >>>
> > > > > > > > > >>> To be the devil's advocate it probably wouldn't be an ABI regression but
> > > > > > > > > >>> just an regression. Same way as what nice(2) priorities mean hasn't
> > > > > > > > > >>> always been the same over the years, I don't think there is a strict
> > > > > > > > > >>> contract.
> > > > > > > > > >>>
> > > > > > > > > >>> Having said that, it may be different with latency sensitive stuff such
> > > > > > > > > >>> as UIs though since it is very observable and can be very painful to users.
> > > > > > > > > >>>
> > > > > > > > > >>>> If you do not document the UABI contract, then different drivers are
> > > > > > > > > >>>> likely to implement it differently, leading to differing behaviour.
> > > > > > > > > >>>> Also userspace will invent wild ways to abuse the UABI if there is no
> > > > > > > > > >>>> documentation guiding it on proper use. If userspace or end users
> > > > > > > > > >>>> observe different behaviour, that's bad even if it's not a regression.
> > > > > > > > > >>>>
> > > > > > > > > >>>> I don't like the situation either, but it is what it is. UABI stability
> > > > > > > > > >>>> trumps everything regardless of whether it was documented or not.
> > > > > > > > > >>>>
> > > > > > > > > >>>> I bet userspace is going to use this as a "make it faster, make it
> > > > > > > > > >>>> hotter" button. I would not be surprised if someone wrote a LD_PRELOAD
> > > > > > > > > >>>> library that stamps any and all fences with an expired deadline to
> > > > > > > > > >>>> just squeeze out a little more through some weird side-effect.
> > > > > > > > > >>>>
> > > > > > > > > >>>> Well, that's hopefully overboard in scaring, but in the end, I would
> > > > > > > > > >>>> like to see UABI documented so I can have a feeling of what it is for
> > > > > > > > > >>>> and how it was intended to be used. That's all.
> > > > > > > > > >>>
> > > > > > > > > >>> We share the same concern. If you read elsewhere in these threads you
> > > > > > > > > >>> will notice I have been calling this an "arms race". If the ability to
> > > > > > > > > >>> make yourself go faster does not required additional privilege I also
> > > > > > > > > >>> worry everyone will do it at which point it becomes pointless. So yes, I
> > > > > > > > > >>> do share this concern about exposing any of this as an unprivileged uapi.
> > > > > > > > > >>>
> > > > > > > > > >>> Is it possible to limit access to only compositors in some sane way?
> > > > > > > > > >>> Sounds tricky when dma-fence should be disconnected from DRM..
> > > > > > > > > >>
> > > > > > > > > >> Maybe it's not that bad in this particular case, because we are talking
> > > > > > > > > >> only about boosting GPU clocks which benefits everyone (except
> > > > > > > > > >> battery life) and it does not penalize other programs like e.g.
> > > > > > > > > >> job priorities do.
> > > > > > > > > >
> > > > > > > > > > Apart from efficiency that you mentioned, which does not always favor
> > > > > > > > > > higher clocks, sometimes thermal budget is also shared between CPU and
> > > > > > > > > > GPU. So more GPU clocks can mean fewer CPU clocks. It's really hard to
> > > > > > > > > > make optimal choices without the full coordination between both schedulers.
> > > > > > > > > >
> > > > > > > > > > But that is even not the main point, which is that if everyone sets the
> > > > > > > > > > immediate deadline then having the deadline API is a bit pointless. For
> > > > > > > > > > instance there is a reason negative nice needs CAP_SYS_ADMIN.
> > > > > > > > > >
> > > > > > > > > > However Rob has also pointed out the existence of uclamp.min via
> > > > > > > > > > sched_setattr which is unprivileged and can influence frequency
> > > > > > > > > > selection in the CPU world, so I conceded on that point. If CPU world
> > > > > > > > > > has accepted it so can we I guess.
> > > > > > > > > >
> > > > > > > > > > So IMO we are back to whether we can agree defining it is a hint is good
> > > > > > > > > > enough, be in via the name of the ioctl/flag itself or via documentation.
> > > > > > > > > >
> > > > > > > > > >> Drivers are not going to use the deadline for scheduling priorities,
> > > > > > > > > >> right? I don't recall seeing any mention of that.
> > > > > > > > > >>
> > > > > > > > > >> ...right?
> > > > > > > > > >
> > > > > > > > > > I wouldn't have thought it would be beneficial to preclude that, or
> > > > > > > > > > assume what drivers would do with the info to begin with.
> > > > > > > > > >
> > > > > > > > > > For instance in i915 we almost had a deadline based scheduler which was
> > > > > > > > > > much fairer than the current priority sorted fifo and in an ideal world
> > > > > > > > > > we would either revive or re-implement that idea. In which case
> > > > > > > > > > considering the fence deadline would naturally slot in and give true
> > > > > > > > > > integration with compositor deadlines (not just boost clocks and pray it
> > > > > > > > > > helps).
> > > > > > > > > How is user-space to decide whether to use ioctl(SET_DEADLINE) or
> > > > > > > > > poll(POLLPRI)?
> > > > > > > >
> > > > > > > > Implementation of blocking gl/vk/cl APIs, like glFinish() would use
> > > > > > > > poll(POLLPRI).  It could also set an immediate deadline and then call
> > > > > > > > poll() without POLLPRI.
> > > > > > > >
> > > > > > > > Other than compositors which do frame-pacing I expect the main usage
> > > > > > > > of either of these is mesa.
> > > > > > >
> > > > > > > Okay, so it looks like we already agreed that having a way to bump frequency
> > > > > > > from userspace is acceptable. either because there are already other ways
> > > > > > > that you can waste power and because this already acceptable in the CPU
> > > > > > > world.
> > > > > > >
> > > > > > > But why we are doing this in hidden ways then?
> > > > > > >
> > > > > > > Why can't we have this hint per context that is getting executed?
> > > > > > > (either with a boost-context flag or with some low/med/max or '-1' to '1'
> > > > > > > value like the latency priority)?
> > > > > > >
> > > > > > > I don't like the waitboost because this heurisitic fails in some media cases.
> > > > > > > I don't like the global setting because we might be alternating a top-priority
> > > > > > > with low-priority cases...
> > > > > > >
> > > > > > > So, why not something per context in execution?
> > > > > > >
> > > > > >
> > > > > > It needs to be finer granularity than per-context, because not all
> > > > > > waits should trigger boosting.  For example, virglrenderer ends up
> > > > > > with a thread polling unsignaled fences to know when to signal an
> > > > > > interrupt to the guest virtgpu.  This alone shouldn't trigger
> > > > > > boosting.  (We also wouldn't want to completely disable boosting for
> > > > > > virglrenderer.)  Or the usermode driver could be waiting on a fence to
> > > > > > know when to do some cleanup.
> > > > > >
> > > > > > That is not to say that there isn't room for per-context flags to
> > > > > > disable/enable boosting for fences created by that context, meaning it
> > > > > > could be an AND operation for i915 if it needs to be.
> > > > >
> > > > > First of all, I believe that the fence deadline hint is a good idea.
> > > > > With that being said, I also don't think it is sufficient in a lot of
> > > > > cases.
> > > > >
> > > > > The one thing I was alluding to before and that Pekka mentioned as
> > > > > well is that mutter for example has a problem where we're missing the
> > > > > deadline consistently because the clocks don't ramp up fast enough and
> > > > > there is a MR which is just trying to keep the GPU busy to avoid this.
> > > >
> > > > the dynamic double/triple buffer thing?
> > >
> > > Yes
> > >
> > > > > It would be much better if the kernel could make sure the clocks are
> > > > > all ramped up when we start submitting work. In the compositor we
> > > > > actually have a lot of information that *should* influence clocks. We
> > > > > know when we're going to start submitting work and when the deadline
> > > > > for that work is beforehand. We know which windows are visible, and
> > > > > which one should have the highest priority.
> > > >
> > > > This sounds like something orthogonal.. something for cgroups?  Ie.
> > > > android moves visible/foreground apps to a different cgroup to given
> > > > them higher priority.  Tvrtko had a patchset to add drm cgroup
> > > > support..
> > >
> > > For the priority stuff, yes, probably. The visibility information on
> > > the other hand could be used to determine if we want to ramp up the
> > > GPU in the first place.
> >
> > Right, but I think that we could have multiple cgroup based knobs, one
> > that adjusts priority and one that limits/disables deadline based
> > boost?  This way the compositor could setup different policies for
> > visible vs hidden apps influencing both how much time they get on the
> > GPU and boost.
>
> I'm not sure if a negative control really makes sense. There are
> probably timing sensitive use cases where the result doesn't show up
> on the local screen and penalizing them when they are not focused or
> hidden might also not be the best idea.

Policy decisions are up to the OS/distro.. we can only provide
controls that can be tuned, it is up to someone else to choose how to
use those controls, such as whether it wants to differentiate between
visible and non-visible apps.  That is why cgroups and a negative
control are a good solution for controlling what the driver does with
the deadline boost hint.

> > > > > We know when there are
> > > > > input events which actually matter.
> > > >
> > > > This I see input as a different boost source for the driver.  (Ie. one
> > > > boost signal is missing fence deadlines, another is input events,
> > > > etc.)
> > > >
> > > > We end up using downstream input-handlers on the kernel side for this.
> > > > Partially for the freq boost (but mostly not, UI interactive workloads
> > > > like touchscreen scrolling don't generally need high GPU freqs, they
> > > > are more memory bandwidth limited if they are limited by anything)..
> > > > really the reason here is to get a head-start on the ~2ms that it
> > > > takes to power up the GPU if it is suspended.
> > >
> > > Right, but one of my main points I want to make here is that we could
> > > get the head-start not only in response to input events but also for
> > > the GPU work the compositor submits and in the future also to GPU work
> > > that clients commit. Except that we don't have a way to tell the
> > > kernel about it.
> > >
> > > > But this is not quite perfect, since for example some keys should be
> > > > handled on key-down but others on key-up.
> > > >
> > > > But again, this is something different from fence deadlines.  I'm
> > > > interested in proposals because we do need something for this.  But I
> > > > think it is something is orthogonal to this series.  For input, we
> > > > want the kernel to know long before userspace is ready to submit
> > > > rendering.
> > >
> > > We can do that in the compositor! Input events are really not
> > > something you should care about in the kernel. Input itself is also
> > > not the only indication of incoming animated content. Some network
> > > packets arriving could equally well result in the same situation.
> >
> > We do use input boost not just for GPU freq, but also for CPU freq.
> > It seems like triggering it from the kernel could happen somewhat
> > sooner than userspace.  (But just speculation.)
>
> Technically it has to be sooner but I doubt it makes any difference.
> Getting a lot of false-positives on the other hand does make a
> difference.

Regardless of whether it is done in kernel or userspace, you do want a
cooldown period so you are not constantly boosting.

Generally false-positives aren't much of a problem.. ie. touch or
mouse events are not ambiguous.  The exception is key events, because
ideally you don't want to hard-code in the kernel which keys are
modifier keys that should be handled on key-up instead of key-down.
But I think if we allowed userspace to configure this somehow, it
would be perfectly reasonable (and optimal) to handle input boost in
the kernel.

> > I'm not sure network packets count.  Or at least compared to a touch
> > interface, the user won't perceive the lag nearly as much, compared to
> > whether or not the UI tracks their fingertips.
>
> Sure, stutter in interactive cases is especially awful but stutter is
> awful in general. If we can solve it in all cases we should do so.

Sure, but the cases where I've seen a need for input boost are all
about the transition from idle->active.  Ie. the display panel has
gone to self-refresh, gpu is suspended, cpu and memory is downclocked.
And suddenly the user decides they want to start scrolling.  We need
some help to adapt to the new state where we are all of a sudden busy
doing something.  This is where input-boost comes in.  If there is no
direct connection to user input, then there is no lag for the user to
percieve.

> > > > > We know when the deadline for
> > > > > client work is.
> > > > >
> > > > > In the future we also want to make sure clients know beforehand when
> > > > > they should start their work and when the deadline is but that's all
> > > > > very much WIP in both wayland and vulkan.
> > > > >
> > > > > There are two issues:
> > > > >
> > > > > 1. The compositor has no way to communicate any of that information to
> > > > > the kernel.
> > > > > 2. The only connection to client work the compositor has is a fence to
> > > > > the last bit of work that must be done before the deadline after a
> > > > > wl_surface.commit.
> > > >
> > > > If the client isn't using multiple GPUs, a single fence should be
> > > > sufficient.  And even if it is, well we still have all the dependency
> > > > information on the kernel side.  Ie. drm/sched knows what fences it is
> > > > waiting on if it is waiting to schedule the work associated with the
> > > > last fence.  It would otherwise require drm/sched to be a bit more
> > > > tricky than it is so far in this series.
> > > >
> > > > But I think the normal dual-gpu case, the app is only dealing with a single GPU?
> > >
> > > We generally don't know which GPU a client uses though. We know which
> > > one we're using and tell the client that the buffer should be
> > > compatible with it but that's the extent of information we have, until
> > > we get a fence but that fence usually gets to the compositor pretty
> > > late. Way too late for the compositor to tell the kernel to ramp up
> > > the GPU and still have an impact.
> >
> > Are you sure about that?  I'd have expected the app to hand off
> > fence+buffer to the compositor pretty quickly after rendering is
> > submitted.  This is what I've seen elsewhere.
>
> After rendering is submitted it is already too late if the GPU takes
> 2ms to wake up. And if there is no rendering submitted there is no
> fence and if there is no fence it has no deadline. If we want to solve
> that we also need some kind of 'work will be submitted to the queue
> starting from this time' hint.

Right, which is why I view this as a different problem with a
different solution.

> > > It also seems like we're moving away from tracking execution
> > > dependencies with fences when we're switching to user mode fences.
> >
> > I suppose there are two cases..
> >
> > 1) dependent fences all from the same device.. no prob, the right
> > driver gets the signal that it needs to clk up, and figures out the
> > rest on it's own
>
> AFAIU with user mode fences it's impossible for the kernel to figure
> out what work depends on it and it might never signal. The whole
> deadline on fences thing breaks down with user mode fences.

Given that user-mode fences are not a solved problem (and I don't
think they'll ever completely replace kernel fences), I don't think
this is relevant.

BR,
-R

> > 2) dependent fences from different devices.. I suppose if device B is
> > waiting for a fence from device A before it can make forward progress,
> > why not express this as a deadline hint on A's fence?  (But by the
> > time you have this problem, you are probably not really caring about
> > power consumption, so meh..)
> >
> > BR,
> > -R
> >
> > > > > So in both cases a fence is just not the right primitive for us. We
> > > > > need to be able to provide per-context/queue information for work that
> > > > > will happen in the future and we need a way to refer to a
> > > > > context/queue generically and over IPC to boost the clocks of the
> > > > > device that a client is actually using and maybe even give priority.
> > > > >
> > > > > But like I said, having a per-fence deadline is probably still a good
> > > > > idea and doesn't conflict with any of the more coarse information.
> > > >
> > > > Yeah, I think the thing is you need multiple things, and this is only
> > > > one of them ;-)
> > > >
> > > > BR,
> > > > -R
> > > >
> > >
> >
>

[v4,06/14] dma-buf/sync_file: Support (E)POLLPRI

Commit Message

Comments

Patch