[v2,0/3] drm: commit_work scheduling

Message ID	20200930211723.3028059-1-robdclark@gmail.com (mailing list archive)
Headers	show Return-Path: <SRS0=D63m=DH=vger.kernel.org=linux-arm-msm-owner@kernel.org> From: Rob Clark <robdclark@gmail.com> To: dri-devel@lists.freedesktop.org Cc: linux-arm-msm@vger.kernel.org, Tejun Heo <tj@kernel.org>, timmurray@google.com, Daniel Vetter <daniel@ffwll.ch>, Qais Yousef <qais.yousef@arm.com>, Rob Clark <robdclark@chromium.org>, linux-kernel@vger.kernel.org (open list) Subject: [PATCH v2 0/3] drm: commit_work scheduling Date: Wed, 30 Sep 2020 14:17:19 -0700 Message-Id: <20200930211723.3028059-1-robdclark@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	drm: commit_work scheduling \| expand [v2,0/3] drm: commit_work scheduling [v2,1/3] drm/crtc: Introduce per-crtc kworker [v2,2/3] drm/atomic: Use kthread worker for nonblocking commits [v2,3/3] drm: Expose CRTC's kworker task id

Rob Clark Sept. 30, 2020, 9:17 p.m. UTC

From: Rob Clark <robdclark@chromium.org>

The android userspace treats the display pipeline as a realtime problem.
And arguably, if your goal is to not miss frame deadlines (ie. vblank),
it is.  (See https://lwn.net/Articles/809545/ for the best explaination
that I found.)

But this presents a problem with using workqueues for non-blocking
atomic commit_work(), because the SCHED_FIFO userspace thread(s) can
preempt the worker.  Which is not really the outcome you want.. once
the required fences are scheduled, you want to push the atomic commit
down to hw ASAP.

But the decision of whether commit_work should be RT or not really
depends on what userspace is doing.  For a pure CFS userspace display
pipeline, commit_work() should remain SCHED_NORMAL.

To handle this, convert non-blocking commit_work() to use per-CRTC
kthread workers, instead of system_unbound_wq.  Per-CRTC workers are
used to avoid serializing commits when userspace is using a per-CRTC
update loop.  And the last patch exposes the task id to userspace as
a CRTC property, so that userspace can adjust the priority and sched
policy to fit it's needs.


v2: Drop client cap and in-kernel setting of priority/policy in
    favor of exposing the kworker tid to userspace so that user-
    space can set priority/policy.

Rob Clark (3):
  drm/crtc: Introduce per-crtc kworker
  drm/atomic: Use kthread worker for nonblocking commits
  drm: Expose CRTC's kworker task id

 drivers/gpu/drm/drm_atomic_helper.c | 13 ++++++++----
 drivers/gpu/drm/drm_crtc.c          | 14 +++++++++++++
 drivers/gpu/drm/drm_mode_config.c   | 14 +++++++++++++
 drivers/gpu/drm/drm_mode_object.c   |  4 ++++
 include/drm/drm_atomic.h            | 31 +++++++++++++++++++++++++++++
 include/drm/drm_crtc.h              |  8 ++++++++
 include/drm/drm_mode_config.h       |  9 +++++++++
 include/drm/drm_property.h          |  9 +++++++++
 8 files changed, 98 insertions(+), 4 deletions(-)

Daniel Vetter Oct. 1, 2020, 7:25 a.m. UTC | #1

On Wed, Sep 30, 2020 at 11:16 PM Rob Clark <robdclark@gmail.com> wrote:
>
> From: Rob Clark <robdclark@chromium.org>
>
> The android userspace treats the display pipeline as a realtime problem.
> And arguably, if your goal is to not miss frame deadlines (ie. vblank),
> it is.  (See https://lwn.net/Articles/809545/ for the best explaination
> that I found.)
>
> But this presents a problem with using workqueues for non-blocking
> atomic commit_work(), because the SCHED_FIFO userspace thread(s) can
> preempt the worker.  Which is not really the outcome you want.. once
> the required fences are scheduled, you want to push the atomic commit
> down to hw ASAP.
>
> But the decision of whether commit_work should be RT or not really
> depends on what userspace is doing.  For a pure CFS userspace display
> pipeline, commit_work() should remain SCHED_NORMAL.
>
> To handle this, convert non-blocking commit_work() to use per-CRTC
> kthread workers, instead of system_unbound_wq.  Per-CRTC workers are
> used to avoid serializing commits when userspace is using a per-CRTC
> update loop.  And the last patch exposes the task id to userspace as
> a CRTC property, so that userspace can adjust the priority and sched
> policy to fit it's needs.
>
>
> v2: Drop client cap and in-kernel setting of priority/policy in
>     favor of exposing the kworker tid to userspace so that user-
>     space can set priority/policy.

Yeah I think this looks more reasonable. Still a bit irky interface,
so I'd like to get some kworker/rt ack on this. Other opens:
- needs userspace, the usual drill
- we need this also for vblank workers, otherwise this wont work for
drivers needing those because of another priority inversion.
- we probably want some indication of whether this actually does
something useful, not all drivers use atomic commit helpers. Not sure
how to do that.
- not sure whether the vfunc is an awesome idea, I'd frankly just
open-code this inline. We have similar special cases already for e.g.
dpms (and in multiple places), this isn't the worst.
- still feeling we could at least change the default to highpriority niceness.
- there's still the problem that commit works can overlap, and a
single worker can't do that anymore. So rolling that out for everyone
as-is feels a bit risky.

Cheers, Daniel

>
> Rob Clark (3):
>   drm/crtc: Introduce per-crtc kworker
>   drm/atomic: Use kthread worker for nonblocking commits
>   drm: Expose CRTC's kworker task id
>
>  drivers/gpu/drm/drm_atomic_helper.c | 13 ++++++++----
>  drivers/gpu/drm/drm_crtc.c          | 14 +++++++++++++
>  drivers/gpu/drm/drm_mode_config.c   | 14 +++++++++++++
>  drivers/gpu/drm/drm_mode_object.c   |  4 ++++
>  include/drm/drm_atomic.h            | 31 +++++++++++++++++++++++++++++
>  include/drm/drm_crtc.h              |  8 ++++++++
>  include/drm/drm_mode_config.h       |  9 +++++++++
>  include/drm/drm_property.h          |  9 +++++++++
>  8 files changed, 98 insertions(+), 4 deletions(-)
>
> --
> 2.26.2
>

Rob Clark Oct. 1, 2020, 3:15 p.m. UTC | #2

On Thu, Oct 1, 2020 at 12:25 AM Daniel Vetter <daniel@ffwll.ch> wrote:
>
> On Wed, Sep 30, 2020 at 11:16 PM Rob Clark <robdclark@gmail.com> wrote:
> >
> > From: Rob Clark <robdclark@chromium.org>
> >
> > The android userspace treats the display pipeline as a realtime problem.
> > And arguably, if your goal is to not miss frame deadlines (ie. vblank),
> > it is.  (See https://lwn.net/Articles/809545/ for the best explaination
> > that I found.)
> >
> > But this presents a problem with using workqueues for non-blocking
> > atomic commit_work(), because the SCHED_FIFO userspace thread(s) can
> > preempt the worker.  Which is not really the outcome you want.. once
> > the required fences are scheduled, you want to push the atomic commit
> > down to hw ASAP.
> >
> > But the decision of whether commit_work should be RT or not really
> > depends on what userspace is doing.  For a pure CFS userspace display
> > pipeline, commit_work() should remain SCHED_NORMAL.
> >
> > To handle this, convert non-blocking commit_work() to use per-CRTC
> > kthread workers, instead of system_unbound_wq.  Per-CRTC workers are
> > used to avoid serializing commits when userspace is using a per-CRTC
> > update loop.  And the last patch exposes the task id to userspace as
> > a CRTC property, so that userspace can adjust the priority and sched
> > policy to fit it's needs.
> >
> >
> > v2: Drop client cap and in-kernel setting of priority/policy in
> >     favor of exposing the kworker tid to userspace so that user-
> >     space can set priority/policy.
>
> Yeah I think this looks more reasonable. Still a bit irky interface,
> so I'd like to get some kworker/rt ack on this. Other opens:
> - needs userspace, the usual drill

fwiw, right now the userspace is "modetest + chrt".. *probably* the
userspace will become a standalone helper or daemon, mostly because
the chrome gpu-process sandbox does not allow setting SCHED_FIFO.  I'm
still entertaining the possibility of switching between rt and cfs
depending on what is in the foreground (ie. only do rt for android
apps).

> - we need this also for vblank workers, otherwise this wont work for
> drivers needing those because of another priority inversion.

I have a thought on that, see below..

> - we probably want some indication of whether this actually does
> something useful, not all drivers use atomic commit helpers. Not sure
> how to do that.

I'm leaning towards converting the other drivers over to use the
per-crtc kwork, and then dropping the 'commit_work` from atomic state.
I can add a patch to that, but figured I could postpone that churn
until there is some by-in on this whole idea.

> - not sure whether the vfunc is an awesome idea, I'd frankly just
> open-code this inline. We have similar special cases already for e.g.
> dpms (and in multiple places), this isn't the worst.

I could introduce a "PID" property type.  This would be useful if we
wanted to also expose the vblank workers.

> - still feeling we could at least change the default to highpriority niceness.

AFAIU this would still be preempted by something that is SCHED_FIFO.
Also, with cfs/SCHED_NORMAL, you can still be preempted by lower
priority things that haven't had a chance to run for a while.

> - there's still the problem that commit works can overlap, and a
> single worker can't do that anymore. So rolling that out for everyone
> as-is feels a bit risky.

That is why it is per-CRTC..  I'm not sure there is a need to overlap
work for a single CRTC?

We could OFC make this a driver knob, and keep the old 'commit_work'
option, but that doesn't really feel like the right direction

BR,
-R

> Cheers, Daniel
>
> >
> > Rob Clark (3):
> >   drm/crtc: Introduce per-crtc kworker
> >   drm/atomic: Use kthread worker for nonblocking commits
> >   drm: Expose CRTC's kworker task id
> >
> >  drivers/gpu/drm/drm_atomic_helper.c | 13 ++++++++----
> >  drivers/gpu/drm/drm_crtc.c          | 14 +++++++++++++
> >  drivers/gpu/drm/drm_mode_config.c   | 14 +++++++++++++
> >  drivers/gpu/drm/drm_mode_object.c   |  4 ++++
> >  include/drm/drm_atomic.h            | 31 +++++++++++++++++++++++++++++
> >  include/drm/drm_crtc.h              |  8 ++++++++
> >  include/drm/drm_mode_config.h       |  9 +++++++++
> >  include/drm/drm_property.h          |  9 +++++++++
> >  8 files changed, 98 insertions(+), 4 deletions(-)
> >
> > --
> > 2.26.2
> >
>
>
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch

Daniel Vetter Oct. 1, 2020, 3:25 p.m. UTC | #3

On Thu, Oct 1, 2020 at 5:15 PM Rob Clark <robdclark@gmail.com> wrote:
>
> On Thu, Oct 1, 2020 at 12:25 AM Daniel Vetter <daniel@ffwll.ch> wrote:
> >
> > On Wed, Sep 30, 2020 at 11:16 PM Rob Clark <robdclark@gmail.com> wrote:
> > >
> > > From: Rob Clark <robdclark@chromium.org>
> > >
> > > The android userspace treats the display pipeline as a realtime problem.
> > > And arguably, if your goal is to not miss frame deadlines (ie. vblank),
> > > it is.  (See https://lwn.net/Articles/809545/ for the best explaination
> > > that I found.)
> > >
> > > But this presents a problem with using workqueues for non-blocking
> > > atomic commit_work(), because the SCHED_FIFO userspace thread(s) can
> > > preempt the worker.  Which is not really the outcome you want.. once
> > > the required fences are scheduled, you want to push the atomic commit
> > > down to hw ASAP.
> > >
> > > But the decision of whether commit_work should be RT or not really
> > > depends on what userspace is doing.  For a pure CFS userspace display
> > > pipeline, commit_work() should remain SCHED_NORMAL.
> > >
> > > To handle this, convert non-blocking commit_work() to use per-CRTC
> > > kthread workers, instead of system_unbound_wq.  Per-CRTC workers are
> > > used to avoid serializing commits when userspace is using a per-CRTC
> > > update loop.  And the last patch exposes the task id to userspace as
> > > a CRTC property, so that userspace can adjust the priority and sched
> > > policy to fit it's needs.
> > >
> > >
> > > v2: Drop client cap and in-kernel setting of priority/policy in
> > >     favor of exposing the kworker tid to userspace so that user-
> > >     space can set priority/policy.
> >
> > Yeah I think this looks more reasonable. Still a bit irky interface,
> > so I'd like to get some kworker/rt ack on this. Other opens:
> > - needs userspace, the usual drill
>
> fwiw, right now the userspace is "modetest + chrt".. *probably* the
> userspace will become a standalone helper or daemon, mostly because
> the chrome gpu-process sandbox does not allow setting SCHED_FIFO.  I'm
> still entertaining the possibility of switching between rt and cfs
> depending on what is in the foreground (ie. only do rt for android
> apps).
>
> > - we need this also for vblank workers, otherwise this wont work for
> > drivers needing those because of another priority inversion.
>
> I have a thought on that, see below..

Hm, not seeing anything about vblank worker below?

> > - we probably want some indication of whether this actually does
> > something useful, not all drivers use atomic commit helpers. Not sure
> > how to do that.
>
> I'm leaning towards converting the other drivers over to use the
> per-crtc kwork, and then dropping the 'commit_work` from atomic state.
> I can add a patch to that, but figured I could postpone that churn
> until there is some by-in on this whole idea.

i915 has its own commit code, it's not even using the current commit
helpers (nor the commit_work). Not sure how much other fun there is.

> > - not sure whether the vfunc is an awesome idea, I'd frankly just
> > open-code this inline. We have similar special cases already for e.g.
> > dpms (and in multiple places), this isn't the worst.
>
> I could introduce a "PID" property type.  This would be useful if we
> wanted to also expose the vblank workers.

Hm right, but I think we need at most 2 for commit thread and vblank
thread (at least with the current design). So open-coded if with two
if (prop == crtc_worker_pid_prop || prop  ==
crtc_vblank_worker_pid_prop) doesn't seem too horrible to me.
Otherwise people start creating really funny stuff in their drivers
with this backend, and I don't want to spend all the time making sure
they don't abuse this :-)

> > - still feeling we could at least change the default to highpriority niceness.
>
> AFAIU this would still be preempted by something that is SCHED_FIFO.
> Also, with cfs/SCHED_NORMAL, you can still be preempted by lower
> priority things that haven't had a chance to run for a while.

i915 uses highprio workqueue, so I guess to avoid regressions we need
that (it's also not using the commit helpers right now, but no reason
not to afaics, stuff simply happened in parallel back then.

> > - there's still the problem that commit works can overlap, and a
> > single worker can't do that anymore. So rolling that out for everyone
> > as-is feels a bit risky.
>
> That is why it is per-CRTC..  I'm not sure there is a need to overlap
> work for a single CRTC?
>
> We could OFC make this a driver knob, and keep the old 'commit_work'
> option, but that doesn't really feel like the right direction

drm_atomic_helper_commit_hw_done is what unblocks the next worker on
the same set of crtc. It's before we do all the buffer cleanup, which
has a full vblank stall beforehand. Most drivers also have the same
vblank stall in their next commit, plus generally the fb cleanup is
cheap, but neither is a requirement. So yeah you can get overlapping
commit work on the same crtc, and that was kinda intentional. Maybe
was overkill, I guess minimally just something that needs to be in the
commit message.
-Daniel

>
> BR,
> -R
>
> > Cheers, Daniel
> >
> > >
> > > Rob Clark (3):
> > >   drm/crtc: Introduce per-crtc kworker
> > >   drm/atomic: Use kthread worker for nonblocking commits
> > >   drm: Expose CRTC's kworker task id
> > >
> > >  drivers/gpu/drm/drm_atomic_helper.c | 13 ++++++++----
> > >  drivers/gpu/drm/drm_crtc.c          | 14 +++++++++++++
> > >  drivers/gpu/drm/drm_mode_config.c   | 14 +++++++++++++
> > >  drivers/gpu/drm/drm_mode_object.c   |  4 ++++
> > >  include/drm/drm_atomic.h            | 31 +++++++++++++++++++++++++++++
> > >  include/drm/drm_crtc.h              |  8 ++++++++
> > >  include/drm/drm_mode_config.h       |  9 +++++++++
> > >  include/drm/drm_property.h          |  9 +++++++++
> > >  8 files changed, 98 insertions(+), 4 deletions(-)
> > >
> > > --
> > > 2.26.2
> > >
> >
> >
> > --
> > Daniel Vetter
> > Software Engineer, Intel Corporation
> > http://blog.ffwll.ch

Ville Syrjälä Oct. 2, 2020, 10:52 a.m. UTC | #4

On Thu, Oct 01, 2020 at 05:25:55PM +0200, Daniel Vetter wrote:
> On Thu, Oct 1, 2020 at 5:15 PM Rob Clark <robdclark@gmail.com> wrote:
> >
> > On Thu, Oct 1, 2020 at 12:25 AM Daniel Vetter <daniel@ffwll.ch> wrote:
> > >
> > > On Wed, Sep 30, 2020 at 11:16 PM Rob Clark <robdclark@gmail.com> wrote:
> > > >
> > > > From: Rob Clark <robdclark@chromium.org>
> > > >
> > > > The android userspace treats the display pipeline as a realtime problem.
> > > > And arguably, if your goal is to not miss frame deadlines (ie. vblank),
> > > > it is.  (See https://lwn.net/Articles/809545/ for the best explaination
> > > > that I found.)
> > > >
> > > > But this presents a problem with using workqueues for non-blocking
> > > > atomic commit_work(), because the SCHED_FIFO userspace thread(s) can
> > > > preempt the worker.  Which is not really the outcome you want.. once
> > > > the required fences are scheduled, you want to push the atomic commit
> > > > down to hw ASAP.
> > > >
> > > > But the decision of whether commit_work should be RT or not really
> > > > depends on what userspace is doing.  For a pure CFS userspace display
> > > > pipeline, commit_work() should remain SCHED_NORMAL.
> > > >
> > > > To handle this, convert non-blocking commit_work() to use per-CRTC
> > > > kthread workers, instead of system_unbound_wq.  Per-CRTC workers are
> > > > used to avoid serializing commits when userspace is using a per-CRTC
> > > > update loop.  And the last patch exposes the task id to userspace as
> > > > a CRTC property, so that userspace can adjust the priority and sched
> > > > policy to fit it's needs.
> > > >
> > > >
> > > > v2: Drop client cap and in-kernel setting of priority/policy in
> > > >     favor of exposing the kworker tid to userspace so that user-
> > > >     space can set priority/policy.
> > >
> > > Yeah I think this looks more reasonable. Still a bit irky interface,
> > > so I'd like to get some kworker/rt ack on this. Other opens:
> > > - needs userspace, the usual drill
> >
> > fwiw, right now the userspace is "modetest + chrt".. *probably* the
> > userspace will become a standalone helper or daemon, mostly because
> > the chrome gpu-process sandbox does not allow setting SCHED_FIFO.  I'm
> > still entertaining the possibility of switching between rt and cfs
> > depending on what is in the foreground (ie. only do rt for android
> > apps).
> >
> > > - we need this also for vblank workers, otherwise this wont work for
> > > drivers needing those because of another priority inversion.
> >
> > I have a thought on that, see below..
> 
> Hm, not seeing anything about vblank worker below?
> 
> > > - we probably want some indication of whether this actually does
> > > something useful, not all drivers use atomic commit helpers. Not sure
> > > how to do that.
> >
> > I'm leaning towards converting the other drivers over to use the
> > per-crtc kwork, and then dropping the 'commit_work` from atomic state.
> > I can add a patch to that, but figured I could postpone that churn
> > until there is some by-in on this whole idea.
> 
> i915 has its own commit code, it's not even using the current commit
> helpers (nor the commit_work). Not sure how much other fun there is.

I don't think we want per-crtc threads for this in i915. Seems
to me easier to guarantee atomicity across multiple crtcs if
we just commit them from the same thread.

Qais Yousef Oct. 2, 2020, 11:01 a.m. UTC | #5

On 09/30/20 14:17, Rob Clark wrote:
> From: Rob Clark <robdclark@chromium.org>
> 
> The android userspace treats the display pipeline as a realtime problem.
> And arguably, if your goal is to not miss frame deadlines (ie. vblank),
> it is.  (See https://lwn.net/Articles/809545/ for the best explaination
> that I found.)
> 
> But this presents a problem with using workqueues for non-blocking
> atomic commit_work(), because the SCHED_FIFO userspace thread(s) can
> preempt the worker.  Which is not really the outcome you want.. once
> the required fences are scheduled, you want to push the atomic commit
> down to hw ASAP.

For me thees 2 properties

	1. Run ASAP
	2. Finish the work un-interrupted

Scream the workers need to be SCHED_FIFO by default. CFS can't give you these
guarantees.

IMO using sched_set_fifo() for these workers is the right thing.

> 
> But the decision of whether commit_work should be RT or not really
> depends on what userspace is doing.  For a pure CFS userspace display
> pipeline, commit_work() should remain SCHED_NORMAL.

I'm not sure I agree with this. I think it's better to characterize tasks based
on their properties/requirements rather than what the rest of the userspace is
using.

I do appreciate that maybe some of these tasks have varying requirements during
their life time. e.g: they have RT property during specific critical section
but otherwise are CFS tasks. I think the UI thread in Android behaves like
that.

It's worth IMO trying that approach I pointed out earlier to see if making RT
try to pick an idle CPU rather than preempt CFS helps. Not sure if it'd be
accepted but IMHO it's a better direction to consider and discuss.

Or maybe you can wrap userspace pipeline critical section lock such that any
task holding it will automatically be promoted to SCHED_FIFO and then demoted
to CFS once it releases it.

Haven't worked with display pipelines before, so hopefully this makes sense :-)

Thanks

--
Qais Yousef

> 
> To handle this, convert non-blocking commit_work() to use per-CRTC
> kthread workers, instead of system_unbound_wq.  Per-CRTC workers are
> used to avoid serializing commits when userspace is using a per-CRTC
> update loop.  And the last patch exposes the task id to userspace as
> a CRTC property, so that userspace can adjust the priority and sched
> policy to fit it's needs.
> 
> 
> v2: Drop client cap and in-kernel setting of priority/policy in
>     favor of exposing the kworker tid to userspace so that user-
>     space can set priority/policy.
> 
> Rob Clark (3):
>   drm/crtc: Introduce per-crtc kworker
>   drm/atomic: Use kthread worker for nonblocking commits
>   drm: Expose CRTC's kworker task id
> 
>  drivers/gpu/drm/drm_atomic_helper.c | 13 ++++++++----
>  drivers/gpu/drm/drm_crtc.c          | 14 +++++++++++++
>  drivers/gpu/drm/drm_mode_config.c   | 14 +++++++++++++
>  drivers/gpu/drm/drm_mode_object.c   |  4 ++++
>  include/drm/drm_atomic.h            | 31 +++++++++++++++++++++++++++++
>  include/drm/drm_crtc.h              |  8 ++++++++
>  include/drm/drm_mode_config.h       |  9 +++++++++
>  include/drm/drm_property.h          |  9 +++++++++
>  8 files changed, 98 insertions(+), 4 deletions(-)
> 
> -- 
> 2.26.2
>

Ville Syrjälä Oct. 2, 2020, 11:05 a.m. UTC | #6

On Fri, Oct 02, 2020 at 01:52:56PM +0300, Ville Syrjälä wrote:
> On Thu, Oct 01, 2020 at 05:25:55PM +0200, Daniel Vetter wrote:
> > On Thu, Oct 1, 2020 at 5:15 PM Rob Clark <robdclark@gmail.com> wrote:
> > >
> > > On Thu, Oct 1, 2020 at 12:25 AM Daniel Vetter <daniel@ffwll.ch> wrote:
> > > >
> > > > On Wed, Sep 30, 2020 at 11:16 PM Rob Clark <robdclark@gmail.com> wrote:
> > > > >
> > > > > From: Rob Clark <robdclark@chromium.org>
> > > > >
> > > > > The android userspace treats the display pipeline as a realtime problem.
> > > > > And arguably, if your goal is to not miss frame deadlines (ie. vblank),
> > > > > it is.  (See https://lwn.net/Articles/809545/ for the best explaination
> > > > > that I found.)
> > > > >
> > > > > But this presents a problem with using workqueues for non-blocking
> > > > > atomic commit_work(), because the SCHED_FIFO userspace thread(s) can
> > > > > preempt the worker.  Which is not really the outcome you want.. once
> > > > > the required fences are scheduled, you want to push the atomic commit
> > > > > down to hw ASAP.
> > > > >
> > > > > But the decision of whether commit_work should be RT or not really
> > > > > depends on what userspace is doing.  For a pure CFS userspace display
> > > > > pipeline, commit_work() should remain SCHED_NORMAL.
> > > > >
> > > > > To handle this, convert non-blocking commit_work() to use per-CRTC
> > > > > kthread workers, instead of system_unbound_wq.  Per-CRTC workers are
> > > > > used to avoid serializing commits when userspace is using a per-CRTC
> > > > > update loop.  And the last patch exposes the task id to userspace as
> > > > > a CRTC property, so that userspace can adjust the priority and sched
> > > > > policy to fit it's needs.
> > > > >
> > > > >
> > > > > v2: Drop client cap and in-kernel setting of priority/policy in
> > > > >     favor of exposing the kworker tid to userspace so that user-
> > > > >     space can set priority/policy.
> > > >
> > > > Yeah I think this looks more reasonable. Still a bit irky interface,
> > > > so I'd like to get some kworker/rt ack on this. Other opens:
> > > > - needs userspace, the usual drill
> > >
> > > fwiw, right now the userspace is "modetest + chrt".. *probably* the
> > > userspace will become a standalone helper or daemon, mostly because
> > > the chrome gpu-process sandbox does not allow setting SCHED_FIFO.  I'm
> > > still entertaining the possibility of switching between rt and cfs
> > > depending on what is in the foreground (ie. only do rt for android
> > > apps).
> > >
> > > > - we need this also for vblank workers, otherwise this wont work for
> > > > drivers needing those because of another priority inversion.
> > >
> > > I have a thought on that, see below..
> > 
> > Hm, not seeing anything about vblank worker below?
> > 
> > > > - we probably want some indication of whether this actually does
> > > > something useful, not all drivers use atomic commit helpers. Not sure
> > > > how to do that.
> > >
> > > I'm leaning towards converting the other drivers over to use the
> > > per-crtc kwork, and then dropping the 'commit_work` from atomic state.
> > > I can add a patch to that, but figured I could postpone that churn
> > > until there is some by-in on this whole idea.
> > 
> > i915 has its own commit code, it's not even using the current commit
> > helpers (nor the commit_work). Not sure how much other fun there is.
> 
> I don't think we want per-crtc threads for this in i915. Seems
> to me easier to guarantee atomicity across multiple crtcs if
> we just commit them from the same thread.

Oh, and we may have to commit things in a very specific order
to guarantee the hw doesn't fall over, so yeah definitely per-crtc
thread is a no go.

I don't even understand the serialization argument. If the commits
are truly independent then why isn't the unbound wq enough to avoid
the serialization? It should just spin up a new thread for each commit
no?

Rob Clark Oct. 2, 2020, 5:55 p.m. UTC | #7

On Fri, Oct 2, 2020 at 4:05 AM Ville Syrjälä
<ville.syrjala@linux.intel.com> wrote:
>
> On Fri, Oct 02, 2020 at 01:52:56PM +0300, Ville Syrjälä wrote:
> > On Thu, Oct 01, 2020 at 05:25:55PM +0200, Daniel Vetter wrote:
> > > On Thu, Oct 1, 2020 at 5:15 PM Rob Clark <robdclark@gmail.com> wrote:
> > > >
> > > > On Thu, Oct 1, 2020 at 12:25 AM Daniel Vetter <daniel@ffwll.ch> wrote:
> > > > >
> > > > > On Wed, Sep 30, 2020 at 11:16 PM Rob Clark <robdclark@gmail.com> wrote:
> > > > > >
> > > > > > From: Rob Clark <robdclark@chromium.org>
> > > > > >
> > > > > > The android userspace treats the display pipeline as a realtime problem.
> > > > > > And arguably, if your goal is to not miss frame deadlines (ie. vblank),
> > > > > > it is.  (See https://lwn.net/Articles/809545/ for the best explaination
> > > > > > that I found.)
> > > > > >
> > > > > > But this presents a problem with using workqueues for non-blocking
> > > > > > atomic commit_work(), because the SCHED_FIFO userspace thread(s) can
> > > > > > preempt the worker.  Which is not really the outcome you want.. once
> > > > > > the required fences are scheduled, you want to push the atomic commit
> > > > > > down to hw ASAP.
> > > > > >
> > > > > > But the decision of whether commit_work should be RT or not really
> > > > > > depends on what userspace is doing.  For a pure CFS userspace display
> > > > > > pipeline, commit_work() should remain SCHED_NORMAL.
> > > > > >
> > > > > > To handle this, convert non-blocking commit_work() to use per-CRTC
> > > > > > kthread workers, instead of system_unbound_wq.  Per-CRTC workers are
> > > > > > used to avoid serializing commits when userspace is using a per-CRTC
> > > > > > update loop.  And the last patch exposes the task id to userspace as
> > > > > > a CRTC property, so that userspace can adjust the priority and sched
> > > > > > policy to fit it's needs.
> > > > > >
> > > > > >
> > > > > > v2: Drop client cap and in-kernel setting of priority/policy in
> > > > > >     favor of exposing the kworker tid to userspace so that user-
> > > > > >     space can set priority/policy.
> > > > >
> > > > > Yeah I think this looks more reasonable. Still a bit irky interface,
> > > > > so I'd like to get some kworker/rt ack on this. Other opens:
> > > > > - needs userspace, the usual drill
> > > >
> > > > fwiw, right now the userspace is "modetest + chrt".. *probably* the
> > > > userspace will become a standalone helper or daemon, mostly because
> > > > the chrome gpu-process sandbox does not allow setting SCHED_FIFO.  I'm
> > > > still entertaining the possibility of switching between rt and cfs
> > > > depending on what is in the foreground (ie. only do rt for android
> > > > apps).
> > > >
> > > > > - we need this also for vblank workers, otherwise this wont work for
> > > > > drivers needing those because of another priority inversion.
> > > >
> > > > I have a thought on that, see below..
> > >
> > > Hm, not seeing anything about vblank worker below?
> > >
> > > > > - we probably want some indication of whether this actually does
> > > > > something useful, not all drivers use atomic commit helpers. Not sure
> > > > > how to do that.
> > > >
> > > > I'm leaning towards converting the other drivers over to use the
> > > > per-crtc kwork, and then dropping the 'commit_work` from atomic state.
> > > > I can add a patch to that, but figured I could postpone that churn
> > > > until there is some by-in on this whole idea.
> > >
> > > i915 has its own commit code, it's not even using the current commit
> > > helpers (nor the commit_work). Not sure how much other fun there is.
> >
> > I don't think we want per-crtc threads for this in i915. Seems
> > to me easier to guarantee atomicity across multiple crtcs if
> > we just commit them from the same thread.
>
> Oh, and we may have to commit things in a very specific order
> to guarantee the hw doesn't fall over, so yeah definitely per-crtc
> thread is a no go.

If I'm understanding the i915 code, this is only the case for modeset
commits?  I suppose we could achieve the same result by just deciding
to pick the kthread of the first CRTC for modeset commits.  I'm not
really so much concerned about parallelism for modeset.

> I don't even understand the serialization argument. If the commits
> are truly independent then why isn't the unbound wq enough to avoid
> the serialization? It should just spin up a new thread for each commit
> no?

The problem with wq is prioritization and SCHED_FIFO userspace
components stomping on the feet of commit_work.  That is the entire
motivation of this series in the first place, so no we cannot use
unbound wq.

BR,
-R

Rob Clark Oct. 2, 2020, 6:07 p.m. UTC | #8

On Fri, Oct 2, 2020 at 4:01 AM Qais Yousef <qais.yousef@arm.com> wrote:
>
> On 09/30/20 14:17, Rob Clark wrote:
> > From: Rob Clark <robdclark@chromium.org>
> >
> > The android userspace treats the display pipeline as a realtime problem.
> > And arguably, if your goal is to not miss frame deadlines (ie. vblank),
> > it is.  (See https://lwn.net/Articles/809545/ for the best explaination
> > that I found.)
> >
> > But this presents a problem with using workqueues for non-blocking
> > atomic commit_work(), because the SCHED_FIFO userspace thread(s) can
> > preempt the worker.  Which is not really the outcome you want.. once
> > the required fences are scheduled, you want to push the atomic commit
> > down to hw ASAP.
>
> For me thees 2 properties
>
>         1. Run ASAP
>         2. Finish the work un-interrupted
>
> Scream the workers need to be SCHED_FIFO by default. CFS can't give you these
> guarantees.

fwiw, commit_work does sleep/block for some time until fences are
signalled, but then once that happens we want it to run ASAP,
preempting lower priority SCHED_FIFO.

>
> IMO using sched_set_fifo() for these workers is the right thing.
>

Possibly, but we still have limited prioritization options (ie. not
enough) to set these from the kernel.  Giving userspace the control,
so it can pick sensible priorities for commit_work and vblank_work,
which fits in with the priorities of the other userspace threads seems
like the sensible thing.

> >
> > But the decision of whether commit_work should be RT or not really
> > depends on what userspace is doing.  For a pure CFS userspace display
> > pipeline, commit_work() should remain SCHED_NORMAL.
>
> I'm not sure I agree with this. I think it's better to characterize tasks based
> on their properties/requirements rather than what the rest of the userspace is
> using.

I mean, the issue is that userspace is already using a few different
rt priority levels for different SF threads.  We want commit_work to
run ASAP once fences are signalled, and vblank_work to run at a
slightly higher priority still.  But the correct choice for priorities
here depends on what userspace is using, it all needs to fit together
properly.

>
> I do appreciate that maybe some of these tasks have varying requirements during
> their life time. e.g: they have RT property during specific critical section
> but otherwise are CFS tasks. I think the UI thread in Android behaves like
> that.
>
> It's worth IMO trying that approach I pointed out earlier to see if making RT
> try to pick an idle CPU rather than preempt CFS helps. Not sure if it'd be
> accepted but IMHO it's a better direction to consider and discuss.

The problem I was seeing was actually the opposite..  commit_work
becomes runnable (fences signalled) but doesn't get a chance to run
because a SCHED_FIFO SF thread is running.  (Maybe I misunderstood and
you're approach would help this case too?)

> Or maybe you can wrap userspace pipeline critical section lock such that any
> task holding it will automatically be promoted to SCHED_FIFO and then demoted
> to CFS once it releases it.

The SCHED_DEADLINE + token passing approach that the lwn article
mentioned sounds interesting, if that eventually becomes possible.
But doesn't really help today..

BR,
-R

> Haven't worked with display pipelines before, so hopefully this makes sense :-)
>
> Thanks
>
> --
> Qais Yousef
>
> >
> > To handle this, convert non-blocking commit_work() to use per-CRTC
> > kthread workers, instead of system_unbound_wq.  Per-CRTC workers are
> > used to avoid serializing commits when userspace is using a per-CRTC
> > update loop.  And the last patch exposes the task id to userspace as
> > a CRTC property, so that userspace can adjust the priority and sched
> > policy to fit it's needs.
> >
> >
> > v2: Drop client cap and in-kernel setting of priority/policy in
> >     favor of exposing the kworker tid to userspace so that user-
> >     space can set priority/policy.
> >
> > Rob Clark (3):
> >   drm/crtc: Introduce per-crtc kworker
> >   drm/atomic: Use kthread worker for nonblocking commits
> >   drm: Expose CRTC's kworker task id
> >
> >  drivers/gpu/drm/drm_atomic_helper.c | 13 ++++++++----
> >  drivers/gpu/drm/drm_crtc.c          | 14 +++++++++++++
> >  drivers/gpu/drm/drm_mode_config.c   | 14 +++++++++++++
> >  drivers/gpu/drm/drm_mode_object.c   |  4 ++++
> >  include/drm/drm_atomic.h            | 31 +++++++++++++++++++++++++++++
> >  include/drm/drm_crtc.h              |  8 ++++++++
> >  include/drm/drm_mode_config.h       |  9 +++++++++
> >  include/drm/drm_property.h          |  9 +++++++++
> >  8 files changed, 98 insertions(+), 4 deletions(-)
> >
> > --
> > 2.26.2
> >

Ville Syrjälä Oct. 5, 2020, 12:15 p.m. UTC | #9

On Fri, Oct 02, 2020 at 10:55:52AM -0700, Rob Clark wrote:
> On Fri, Oct 2, 2020 at 4:05 AM Ville Syrjälä
> <ville.syrjala@linux.intel.com> wrote:
> >
> > On Fri, Oct 02, 2020 at 01:52:56PM +0300, Ville Syrjälä wrote:
> > > On Thu, Oct 01, 2020 at 05:25:55PM +0200, Daniel Vetter wrote:
> > > > On Thu, Oct 1, 2020 at 5:15 PM Rob Clark <robdclark@gmail.com> wrote:
> > > > >
> > > > > On Thu, Oct 1, 2020 at 12:25 AM Daniel Vetter <daniel@ffwll.ch> wrote:
> > > > > >
> > > > > > On Wed, Sep 30, 2020 at 11:16 PM Rob Clark <robdclark@gmail.com> wrote:
> > > > > > >
> > > > > > > From: Rob Clark <robdclark@chromium.org>
> > > > > > >
> > > > > > > The android userspace treats the display pipeline as a realtime problem.
> > > > > > > And arguably, if your goal is to not miss frame deadlines (ie. vblank),
> > > > > > > it is.  (See https://lwn.net/Articles/809545/ for the best explaination
> > > > > > > that I found.)
> > > > > > >
> > > > > > > But this presents a problem with using workqueues for non-blocking
> > > > > > > atomic commit_work(), because the SCHED_FIFO userspace thread(s) can
> > > > > > > preempt the worker.  Which is not really the outcome you want.. once
> > > > > > > the required fences are scheduled, you want to push the atomic commit
> > > > > > > down to hw ASAP.
> > > > > > >
> > > > > > > But the decision of whether commit_work should be RT or not really
> > > > > > > depends on what userspace is doing.  For a pure CFS userspace display
> > > > > > > pipeline, commit_work() should remain SCHED_NORMAL.
> > > > > > >
> > > > > > > To handle this, convert non-blocking commit_work() to use per-CRTC
> > > > > > > kthread workers, instead of system_unbound_wq.  Per-CRTC workers are
> > > > > > > used to avoid serializing commits when userspace is using a per-CRTC
> > > > > > > update loop.  And the last patch exposes the task id to userspace as
> > > > > > > a CRTC property, so that userspace can adjust the priority and sched
> > > > > > > policy to fit it's needs.
> > > > > > >
> > > > > > >
> > > > > > > v2: Drop client cap and in-kernel setting of priority/policy in
> > > > > > >     favor of exposing the kworker tid to userspace so that user-
> > > > > > >     space can set priority/policy.
> > > > > >
> > > > > > Yeah I think this looks more reasonable. Still a bit irky interface,
> > > > > > so I'd like to get some kworker/rt ack on this. Other opens:
> > > > > > - needs userspace, the usual drill
> > > > >
> > > > > fwiw, right now the userspace is "modetest + chrt".. *probably* the
> > > > > userspace will become a standalone helper or daemon, mostly because
> > > > > the chrome gpu-process sandbox does not allow setting SCHED_FIFO.  I'm
> > > > > still entertaining the possibility of switching between rt and cfs
> > > > > depending on what is in the foreground (ie. only do rt for android
> > > > > apps).
> > > > >
> > > > > > - we need this also for vblank workers, otherwise this wont work for
> > > > > > drivers needing those because of another priority inversion.
> > > > >
> > > > > I have a thought on that, see below..
> > > >
> > > > Hm, not seeing anything about vblank worker below?
> > > >
> > > > > > - we probably want some indication of whether this actually does
> > > > > > something useful, not all drivers use atomic commit helpers. Not sure
> > > > > > how to do that.
> > > > >
> > > > > I'm leaning towards converting the other drivers over to use the
> > > > > per-crtc kwork, and then dropping the 'commit_work` from atomic state.
> > > > > I can add a patch to that, but figured I could postpone that churn
> > > > > until there is some by-in on this whole idea.
> > > >
> > > > i915 has its own commit code, it's not even using the current commit
> > > > helpers (nor the commit_work). Not sure how much other fun there is.
> > >
> > > I don't think we want per-crtc threads for this in i915. Seems
> > > to me easier to guarantee atomicity across multiple crtcs if
> > > we just commit them from the same thread.
> >
> > Oh, and we may have to commit things in a very specific order
> > to guarantee the hw doesn't fall over, so yeah definitely per-crtc
> > thread is a no go.
> 
> If I'm understanding the i915 code, this is only the case for modeset
> commits?  I suppose we could achieve the same result by just deciding
> to pick the kthread of the first CRTC for modeset commits.  I'm not
> really so much concerned about parallelism for modeset.

I'm not entirely happy about the random differences between modesets
and other commits. Ideally we wouldn't need any.

Anyways, even if we ignore modesets we still have the issue with
atomicity guarantees across multiple crtcs. So I think we still
don't want per-crtc threads, rather it should be thread for each 
commit.

Well, if the crtcs aren't running in lockstep then maybe we could
shove them off to separate threads, but that'll just complicate things
needlessly I think since we'd need yet another way to iterate
the crtcs in each thread. With the thread-per-commit apporach we
can just use the normal atomic iterators.

> 
> > I don't even understand the serialization argument. If the commits
> > are truly independent then why isn't the unbound wq enough to avoid
> > the serialization? It should just spin up a new thread for each commit
> > no?
> 
> The problem with wq is prioritization and SCHED_FIFO userspace
> components stomping on the feet of commit_work. That is the entire
> motivation of this series in the first place, so no we cannot use
> unbound wq.

This is a bit dejavu of the vblank worker discussion, where I actually
did want a per-crtc RT kthread but people weren't convinced they
actually help. The difference is that for vblank workers we actually
tried to get some numbers, here I've not seen any.

Daniel Vetter Oct. 5, 2020, 2:15 p.m. UTC | #10

On Mon, Oct 05, 2020 at 03:15:24PM +0300, Ville Syrjälä wrote:
> On Fri, Oct 02, 2020 at 10:55:52AM -0700, Rob Clark wrote:
> > On Fri, Oct 2, 2020 at 4:05 AM Ville Syrjälä
> > <ville.syrjala@linux.intel.com> wrote:
> > >
> > > On Fri, Oct 02, 2020 at 01:52:56PM +0300, Ville Syrjälä wrote:
> > > > On Thu, Oct 01, 2020 at 05:25:55PM +0200, Daniel Vetter wrote:
> > > > > On Thu, Oct 1, 2020 at 5:15 PM Rob Clark <robdclark@gmail.com> wrote:
> > > > > >
> > > > > > On Thu, Oct 1, 2020 at 12:25 AM Daniel Vetter <daniel@ffwll.ch> wrote:
> > > > > > >
> > > > > > > On Wed, Sep 30, 2020 at 11:16 PM Rob Clark <robdclark@gmail.com> wrote:
> > > > > > > >
> > > > > > > > From: Rob Clark <robdclark@chromium.org>
> > > > > > > >
> > > > > > > > The android userspace treats the display pipeline as a realtime problem.
> > > > > > > > And arguably, if your goal is to not miss frame deadlines (ie. vblank),
> > > > > > > > it is.  (See https://lwn.net/Articles/809545/ for the best explaination
> > > > > > > > that I found.)
> > > > > > > >
> > > > > > > > But this presents a problem with using workqueues for non-blocking
> > > > > > > > atomic commit_work(), because the SCHED_FIFO userspace thread(s) can
> > > > > > > > preempt the worker.  Which is not really the outcome you want.. once
> > > > > > > > the required fences are scheduled, you want to push the atomic commit
> > > > > > > > down to hw ASAP.
> > > > > > > >
> > > > > > > > But the decision of whether commit_work should be RT or not really
> > > > > > > > depends on what userspace is doing.  For a pure CFS userspace display
> > > > > > > > pipeline, commit_work() should remain SCHED_NORMAL.
> > > > > > > >
> > > > > > > > To handle this, convert non-blocking commit_work() to use per-CRTC
> > > > > > > > kthread workers, instead of system_unbound_wq.  Per-CRTC workers are
> > > > > > > > used to avoid serializing commits when userspace is using a per-CRTC
> > > > > > > > update loop.  And the last patch exposes the task id to userspace as
> > > > > > > > a CRTC property, so that userspace can adjust the priority and sched
> > > > > > > > policy to fit it's needs.
> > > > > > > >
> > > > > > > >
> > > > > > > > v2: Drop client cap and in-kernel setting of priority/policy in
> > > > > > > >     favor of exposing the kworker tid to userspace so that user-
> > > > > > > >     space can set priority/policy.
> > > > > > >
> > > > > > > Yeah I think this looks more reasonable. Still a bit irky interface,
> > > > > > > so I'd like to get some kworker/rt ack on this. Other opens:
> > > > > > > - needs userspace, the usual drill
> > > > > >
> > > > > > fwiw, right now the userspace is "modetest + chrt".. *probably* the
> > > > > > userspace will become a standalone helper or daemon, mostly because
> > > > > > the chrome gpu-process sandbox does not allow setting SCHED_FIFO.  I'm
> > > > > > still entertaining the possibility of switching between rt and cfs
> > > > > > depending on what is in the foreground (ie. only do rt for android
> > > > > > apps).
> > > > > >
> > > > > > > - we need this also for vblank workers, otherwise this wont work for
> > > > > > > drivers needing those because of another priority inversion.
> > > > > >
> > > > > > I have a thought on that, see below..
> > > > >
> > > > > Hm, not seeing anything about vblank worker below?
> > > > >
> > > > > > > - we probably want some indication of whether this actually does
> > > > > > > something useful, not all drivers use atomic commit helpers. Not sure
> > > > > > > how to do that.
> > > > > >
> > > > > > I'm leaning towards converting the other drivers over to use the
> > > > > > per-crtc kwork, and then dropping the 'commit_work` from atomic state.
> > > > > > I can add a patch to that, but figured I could postpone that churn
> > > > > > until there is some by-in on this whole idea.
> > > > >
> > > > > i915 has its own commit code, it's not even using the current commit
> > > > > helpers (nor the commit_work). Not sure how much other fun there is.
> > > >
> > > > I don't think we want per-crtc threads for this in i915. Seems
> > > > to me easier to guarantee atomicity across multiple crtcs if
> > > > we just commit them from the same thread.
> > >
> > > Oh, and we may have to commit things in a very specific order
> > > to guarantee the hw doesn't fall over, so yeah definitely per-crtc
> > > thread is a no go.
> > 
> > If I'm understanding the i915 code, this is only the case for modeset
> > commits?  I suppose we could achieve the same result by just deciding
> > to pick the kthread of the first CRTC for modeset commits.  I'm not
> > really so much concerned about parallelism for modeset.
> 
> I'm not entirely happy about the random differences between modesets
> and other commits. Ideally we wouldn't need any.
> 
> Anyways, even if we ignore modesets we still have the issue with
> atomicity guarantees across multiple crtcs. So I think we still
> don't want per-crtc threads, rather it should be thread for each 
> commit.
> 
> Well, if the crtcs aren't running in lockstep then maybe we could
> shove them off to separate threads, but that'll just complicate things
> needlessly I think since we'd need yet another way to iterate
> the crtcs in each thread. With the thread-per-commit apporach we
> can just use the normal atomic iterators.
> 
> > 
> > > I don't even understand the serialization argument. If the commits
> > > are truly independent then why isn't the unbound wq enough to avoid
> > > the serialization? It should just spin up a new thread for each commit
> > > no?
> > 
> > The problem with wq is prioritization and SCHED_FIFO userspace
> > components stomping on the feet of commit_work. That is the entire
> > motivation of this series in the first place, so no we cannot use
> > unbound wq.
> 
> This is a bit dejavu of the vblank worker discussion, where I actually
> did want a per-crtc RT kthread but people weren't convinced they
> actually help. The difference is that for vblank workers we actually
> tried to get some numbers, here I've not seen any.

The problem here is priority inversion, not latency: Android runs
surface-flinger as SCHED_FIFO, so when surfaceflinger does something it
can preempt the kernel's commit work, and we miss a frame. Apparently
otherwise the soft-rt of just having a normal worker (with maybe elevated
niceness) seems nice enough.

Aside: I just double-checked, and vblank work has a per-crtc kthread.
-Daniel

Qais Yousef Oct. 5, 2020, 3 p.m. UTC | #11

+CC Steve and Peter - they might be interested.

On 10/02/20 11:07, Rob Clark wrote:
> On Fri, Oct 2, 2020 at 4:01 AM Qais Yousef <qais.yousef@arm.com> wrote:
> >
> > On 09/30/20 14:17, Rob Clark wrote:
> > > From: Rob Clark <robdclark@chromium.org>
> > >
> > > The android userspace treats the display pipeline as a realtime problem.
> > > And arguably, if your goal is to not miss frame deadlines (ie. vblank),
> > > it is.  (See https://lwn.net/Articles/809545/ for the best explaination
> > > that I found.)
> > >
> > > But this presents a problem with using workqueues for non-blocking
> > > atomic commit_work(), because the SCHED_FIFO userspace thread(s) can
> > > preempt the worker.  Which is not really the outcome you want.. once
> > > the required fences are scheduled, you want to push the atomic commit
> > > down to hw ASAP.
> >
> > For me thees 2 properties
> >
> >         1. Run ASAP
> >         2. Finish the work un-interrupted
> >
> > Scream the workers need to be SCHED_FIFO by default. CFS can't give you these
> > guarantees.
> 
> fwiw, commit_work does sleep/block for some time until fences are
> signalled, but then once that happens we want it to run ASAP,
> preempting lower priority SCHED_FIFO.
> 
> >
> > IMO using sched_set_fifo() for these workers is the right thing.
> >
> 
> Possibly, but we still have limited prioritization options (ie. not
> enough) to set these from the kernel.  Giving userspace the control,
> so it can pick sensible priorities for commit_work and vblank_work,
> which fits in with the priorities of the other userspace threads seems
> like the sensible thing.

The problem is that the kernel can run on all types of systems. It's impossible
to pick one value that fits all. Userspace must manage these priorities, and
you can still export the TID to help with that.

But why do you need several priorities in your pipeline? I would have thought
it should execute each stage sequentially and all tasks running at the same RT
priority is fine.

On SMP priorities matter once you've overcomitted the systems. You need to have
more RT tasks running than CPUs for priorities to matter. It seems you have
a high count of RT tasks in your system?

I did some profiles on Android and found that being overcomitted is hard. But
that was a while ago.

> 
> > >
> > > But the decision of whether commit_work should be RT or not really
> > > depends on what userspace is doing.  For a pure CFS userspace display
> > > pipeline, commit_work() should remain SCHED_NORMAL.
> >
> > I'm not sure I agree with this. I think it's better to characterize tasks based
> > on their properties/requirements rather than what the rest of the userspace is
> > using.
> 
> I mean, the issue is that userspace is already using a few different
> rt priority levels for different SF threads.  We want commit_work to

Why are they at different priorities? Different priority levels means that some
of them have more urgent deadlines to meet and it's okay to steal execution
time from lower priority tasks. Is this the case?

RT planning and partitioning is not easy task for sure. You might want to
consider using affinities too to get stronger guarantees for some tasks and
prevent cross-talking.

> run ASAP once fences are signalled, and vblank_work to run at a
> slightly higher priority still.  But the correct choice for priorities
> here depends on what userspace is using, it all needs to fit together
> properly.

By userspace here I think you mean none display pipeline related RT tasks that
you need to coexit with and could still disrupt your pipeline?

Using RT on Gerneral Purpose System is hard for sure. One of the major
challenge is that there's no admin that has full view of the system to do
proper RT planning.

We need proper RT balancer daemon that helps partitioning the system for
multiple RT apps on these systems..

> 
> >
> > I do appreciate that maybe some of these tasks have varying requirements during
> > their life time. e.g: they have RT property during specific critical section
> > but otherwise are CFS tasks. I think the UI thread in Android behaves like
> > that.
> >
> > It's worth IMO trying that approach I pointed out earlier to see if making RT
> > try to pick an idle CPU rather than preempt CFS helps. Not sure if it'd be
> > accepted but IMHO it's a better direction to consider and discuss.
> 
> The problem I was seeing was actually the opposite..  commit_work
> becomes runnable (fences signalled) but doesn't get a chance to run
> because a SCHED_FIFO SF thread is running.  (Maybe I misunderstood and
> you're approach would help this case too?)

Ah okay. Sorry I got it the wrong way around for some reason. I thought this
task is preempting other CFS-based pipelined tasks.

So your system seems to be overcomitted. Is SF short for SufraceFlinger? Under
what scenarios do you have many SurfaceFlinger tasks? On Android I remember
seeing they have priority of 1 or 2.

sched_set_fifo() will use priority 50. If you set all your pipeline tasks
to this priority, what happens?

> 
> > Or maybe you can wrap userspace pipeline critical section lock such that any
> > task holding it will automatically be promoted to SCHED_FIFO and then demoted
> > to CFS once it releases it.
> 
> The SCHED_DEADLINE + token passing approach that the lwn article
> mentioned sounds interesting, if that eventually becomes possible.
> But doesn't really help today..

We were present in the room with Alessio when he gave that talk :-)

You might have seen Valentin's talk in LPC where he's trying to get
proxy-execution into shape. Which is a pre-requisite to enable using of
SCHED_DEADLINE for these scenarios. IIRC it should allow all dependent tasks to
run from the context of the deadline task during the display pipeline critical
section.

By the way, do you have issues with SoftIrqs delaying your RT tasks execution
time?

Thanks

--
Qais Yousef

Rob Clark Oct. 5, 2020, 10:58 p.m. UTC | #12

On Mon, Oct 5, 2020 at 7:15 AM Daniel Vetter <daniel@ffwll.ch> wrote:
>
> On Mon, Oct 05, 2020 at 03:15:24PM +0300, Ville Syrjälä wrote:
> > On Fri, Oct 02, 2020 at 10:55:52AM -0700, Rob Clark wrote:
> > > On Fri, Oct 2, 2020 at 4:05 AM Ville Syrjälä
> > > <ville.syrjala@linux.intel.com> wrote:
> > > >
> > > > On Fri, Oct 02, 2020 at 01:52:56PM +0300, Ville Syrjälä wrote:
> > > > > On Thu, Oct 01, 2020 at 05:25:55PM +0200, Daniel Vetter wrote:
> > > > > > On Thu, Oct 1, 2020 at 5:15 PM Rob Clark <robdclark@gmail.com> wrote:
> > > > > > >
> > > > > > > On Thu, Oct 1, 2020 at 12:25 AM Daniel Vetter <daniel@ffwll.ch> wrote:
> > > > > > > >
> > > > > > > > On Wed, Sep 30, 2020 at 11:16 PM Rob Clark <robdclark@gmail.com> wrote:
> > > > > > > > >
> > > > > > > > > From: Rob Clark <robdclark@chromium.org>
> > > > > > > > >
> > > > > > > > > The android userspace treats the display pipeline as a realtime problem.
> > > > > > > > > And arguably, if your goal is to not miss frame deadlines (ie. vblank),
> > > > > > > > > it is.  (See https://lwn.net/Articles/809545/ for the best explaination
> > > > > > > > > that I found.)
> > > > > > > > >
> > > > > > > > > But this presents a problem with using workqueues for non-blocking
> > > > > > > > > atomic commit_work(), because the SCHED_FIFO userspace thread(s) can
> > > > > > > > > preempt the worker.  Which is not really the outcome you want.. once
> > > > > > > > > the required fences are scheduled, you want to push the atomic commit
> > > > > > > > > down to hw ASAP.
> > > > > > > > >
> > > > > > > > > But the decision of whether commit_work should be RT or not really
> > > > > > > > > depends on what userspace is doing.  For a pure CFS userspace display
> > > > > > > > > pipeline, commit_work() should remain SCHED_NORMAL.
> > > > > > > > >
> > > > > > > > > To handle this, convert non-blocking commit_work() to use per-CRTC
> > > > > > > > > kthread workers, instead of system_unbound_wq.  Per-CRTC workers are
> > > > > > > > > used to avoid serializing commits when userspace is using a per-CRTC
> > > > > > > > > update loop.  And the last patch exposes the task id to userspace as
> > > > > > > > > a CRTC property, so that userspace can adjust the priority and sched
> > > > > > > > > policy to fit it's needs.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > v2: Drop client cap and in-kernel setting of priority/policy in
> > > > > > > > >     favor of exposing the kworker tid to userspace so that user-
> > > > > > > > >     space can set priority/policy.
> > > > > > > >
> > > > > > > > Yeah I think this looks more reasonable. Still a bit irky interface,
> > > > > > > > so I'd like to get some kworker/rt ack on this. Other opens:
> > > > > > > > - needs userspace, the usual drill
> > > > > > >
> > > > > > > fwiw, right now the userspace is "modetest + chrt".. *probably* the
> > > > > > > userspace will become a standalone helper or daemon, mostly because
> > > > > > > the chrome gpu-process sandbox does not allow setting SCHED_FIFO.  I'm
> > > > > > > still entertaining the possibility of switching between rt and cfs
> > > > > > > depending on what is in the foreground (ie. only do rt for android
> > > > > > > apps).
> > > > > > >
> > > > > > > > - we need this also for vblank workers, otherwise this wont work for
> > > > > > > > drivers needing those because of another priority inversion.
> > > > > > >
> > > > > > > I have a thought on that, see below..
> > > > > >
> > > > > > Hm, not seeing anything about vblank worker below?
> > > > > >
> > > > > > > > - we probably want some indication of whether this actually does
> > > > > > > > something useful, not all drivers use atomic commit helpers. Not sure
> > > > > > > > how to do that.
> > > > > > >
> > > > > > > I'm leaning towards converting the other drivers over to use the
> > > > > > > per-crtc kwork, and then dropping the 'commit_work` from atomic state.
> > > > > > > I can add a patch to that, but figured I could postpone that churn
> > > > > > > until there is some by-in on this whole idea.
> > > > > >
> > > > > > i915 has its own commit code, it's not even using the current commit
> > > > > > helpers (nor the commit_work). Not sure how much other fun there is.
> > > > >
> > > > > I don't think we want per-crtc threads for this in i915. Seems
> > > > > to me easier to guarantee atomicity across multiple crtcs if
> > > > > we just commit them from the same thread.
> > > >
> > > > Oh, and we may have to commit things in a very specific order
> > > > to guarantee the hw doesn't fall over, so yeah definitely per-crtc
> > > > thread is a no go.
> > >
> > > If I'm understanding the i915 code, this is only the case for modeset
> > > commits?  I suppose we could achieve the same result by just deciding
> > > to pick the kthread of the first CRTC for modeset commits.  I'm not
> > > really so much concerned about parallelism for modeset.
> >
> > I'm not entirely happy about the random differences between modesets
> > and other commits. Ideally we wouldn't need any.
> >
> > Anyways, even if we ignore modesets we still have the issue with
> > atomicity guarantees across multiple crtcs. So I think we still
> > don't want per-crtc threads, rather it should be thread for each
> > commit.
> >
> > Well, if the crtcs aren't running in lockstep then maybe we could
> > shove them off to separate threads, but that'll just complicate things
> > needlessly I think since we'd need yet another way to iterate
> > the crtcs in each thread. With the thread-per-commit apporach we
> > can just use the normal atomic iterators.
> >
> > >
> > > > I don't even understand the serialization argument. If the commits
> > > > are truly independent then why isn't the unbound wq enough to avoid
> > > > the serialization? It should just spin up a new thread for each commit
> > > > no?
> > >
> > > The problem with wq is prioritization and SCHED_FIFO userspace
> > > components stomping on the feet of commit_work. That is the entire
> > > motivation of this series in the first place, so no we cannot use
> > > unbound wq.
> >
> > This is a bit dejavu of the vblank worker discussion, where I actually
> > did want a per-crtc RT kthread but people weren't convinced they
> > actually help. The difference is that for vblank workers we actually
> > tried to get some numbers, here I've not seen any.
>
> The problem here is priority inversion, not latency: Android runs
> surface-flinger as SCHED_FIFO, so when surfaceflinger does something it
> can preempt the kernel's commit work, and we miss a frame. Apparently
> otherwise the soft-rt of just having a normal worker (with maybe elevated
> niceness) seems nice enough.

yes, exactly, this is about priority inversion.

Not sure if this is clear (you can't really fit all the relevant parts
of the trace in one screenshot), but here is an example of commit_work
preempted by SF EventThread and missing a deadline:

https://usercontent.irccloud-cdn.com/file/Awgp8Sdj/image.png

BR,
-R

>
> Aside: I just double-checked, and vblank work has a per-crtc kthread.
> -Daniel
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch

Rob Clark Oct. 5, 2020, 11:24 p.m. UTC | #13

On Mon, Oct 5, 2020 at 8:00 AM Qais Yousef <qais.yousef@arm.com> wrote:
>
> +CC Steve and Peter - they might be interested.
>
> On 10/02/20 11:07, Rob Clark wrote:
> > On Fri, Oct 2, 2020 at 4:01 AM Qais Yousef <qais.yousef@arm.com> wrote:
> > >
> > > On 09/30/20 14:17, Rob Clark wrote:
> > > > From: Rob Clark <robdclark@chromium.org>
> > > >
> > > > The android userspace treats the display pipeline as a realtime problem.
> > > > And arguably, if your goal is to not miss frame deadlines (ie. vblank),
> > > > it is.  (See https://lwn.net/Articles/809545/ for the best explaination
> > > > that I found.)
> > > >
> > > > But this presents a problem with using workqueues for non-blocking
> > > > atomic commit_work(), because the SCHED_FIFO userspace thread(s) can
> > > > preempt the worker.  Which is not really the outcome you want.. once
> > > > the required fences are scheduled, you want to push the atomic commit
> > > > down to hw ASAP.
> > >
> > > For me thees 2 properties
> > >
> > >         1. Run ASAP
> > >         2. Finish the work un-interrupted
> > >
> > > Scream the workers need to be SCHED_FIFO by default. CFS can't give you these
> > > guarantees.
> >
> > fwiw, commit_work does sleep/block for some time until fences are
> > signalled, but then once that happens we want it to run ASAP,
> > preempting lower priority SCHED_FIFO.
> >
> > >
> > > IMO using sched_set_fifo() for these workers is the right thing.
> > >
> >
> > Possibly, but we still have limited prioritization options (ie. not
> > enough) to set these from the kernel.  Giving userspace the control,
> > so it can pick sensible priorities for commit_work and vblank_work,
> > which fits in with the priorities of the other userspace threads seems
> > like the sensible thing.
>
> The problem is that the kernel can run on all types of systems. It's impossible
> to pick one value that fits all. Userspace must manage these priorities, and
> you can still export the TID to help with that.
>
> But why do you need several priorities in your pipeline? I would have thought
> it should execute each stage sequentially and all tasks running at the same RT
> priority is fine.

On the kernel side, vblank work should complete during the vblank
period, making it a harder real time requirement.  So the thinking is
this should be a higher priority.

But you are right, if you aren't overcommitted it probably doesn't matter.

> On SMP priorities matter once you've overcomitted the systems. You need to have
> more RT tasks running than CPUs for priorities to matter. It seems you have
> a high count of RT tasks in your system?
>
> I did some profiles on Android and found that being overcomitted is hard. But
> that was a while ago.
>
> >
> > > >
> > > > But the decision of whether commit_work should be RT or not really
> > > > depends on what userspace is doing.  For a pure CFS userspace display
> > > > pipeline, commit_work() should remain SCHED_NORMAL.
> > >
> > > I'm not sure I agree with this. I think it's better to characterize tasks based
> > > on their properties/requirements rather than what the rest of the userspace is
> > > using.
> >
> > I mean, the issue is that userspace is already using a few different
> > rt priority levels for different SF threads.  We want commit_work to
>
> Why are they at different priorities? Different priority levels means that some
> of them have more urgent deadlines to meet and it's okay to steal execution
> time from lower priority tasks. Is this the case?

tbh, I'm not fully aware of the background.  It looks like most of the
SF threads run at priority=2 (100-2==98), and the main one runs at
priority=1

> RT planning and partitioning is not easy task for sure. You might want to
> consider using affinities too to get stronger guarantees for some tasks and
> prevent cross-talking.

There is some cgroup stuff that is pinning SF and some other stuff to
the small cores, fwiw.. I think the reasoning is that they shouldn't
be doing anything heavy enough to need the big cores.

> > run ASAP once fences are signalled, and vblank_work to run at a
> > slightly higher priority still.  But the correct choice for priorities
> > here depends on what userspace is using, it all needs to fit together
> > properly.
>
> By userspace here I think you mean none display pipeline related RT tasks that
> you need to coexit with and could still disrupt your pipeline?

I mean, commit_work should be higher priority than the other (display
related) RT tasks.  But the kernel doesn't know what those priorities
are.

> Using RT on Gerneral Purpose System is hard for sure. One of the major
> challenge is that there's no admin that has full view of the system to do
> proper RT planning.
>
> We need proper RT balancer daemon that helps partitioning the system for
> multiple RT apps on these systems..
>
> >
> > >
> > > I do appreciate that maybe some of these tasks have varying requirements during
> > > their life time. e.g: they have RT property during specific critical section
> > > but otherwise are CFS tasks. I think the UI thread in Android behaves like
> > > that.
> > >
> > > It's worth IMO trying that approach I pointed out earlier to see if making RT
> > > try to pick an idle CPU rather than preempt CFS helps. Not sure if it'd be
> > > accepted but IMHO it's a better direction to consider and discuss.
> >
> > The problem I was seeing was actually the opposite..  commit_work
> > becomes runnable (fences signalled) but doesn't get a chance to run
> > because a SCHED_FIFO SF thread is running.  (Maybe I misunderstood and
> > you're approach would help this case too?)
>
> Ah okay. Sorry I got it the wrong way around for some reason. I thought this
> task is preempting other CFS-based pipelined tasks.
>
> So your system seems to be overcomitted. Is SF short for SufraceFlinger? Under
> what scenarios do you have many SurfaceFlinger tasks? On Android I remember
> seeing they have priority of 1 or 2.

yeah, SF==SurfaceFlinger, and yeah, 1 and 2..

> sched_set_fifo() will use priority 50. If you set all your pipeline tasks
> to this priority, what happens?

I think this would work.. drm/msm doesn't use vblank work, so I
wouldn't really have problems with commit_work preempting vblank_work.
But I think the best option (and to handle the case if android changes
the RT priorties around in the future) is to let userspace set the
priorities.

> >
> > > Or maybe you can wrap userspace pipeline critical section lock such that any
> > > task holding it will automatically be promoted to SCHED_FIFO and then demoted
> > > to CFS once it releases it.
> >
> > The SCHED_DEADLINE + token passing approach that the lwn article
> > mentioned sounds interesting, if that eventually becomes possible.
> > But doesn't really help today..
>
> We were present in the room with Alessio when he gave that talk :-)
>
> You might have seen Valentin's talk in LPC where he's trying to get
> proxy-execution into shape. Which is a pre-requisite to enable using of
> SCHED_DEADLINE for these scenarios. IIRC it should allow all dependent tasks to
> run from the context of the deadline task during the display pipeline critical
> section.
>
> By the way, do you have issues with SoftIrqs delaying your RT tasks execution
> time?

I don't *think* so, but I'm not 100% sure if they are showing up in
traces.  So far it seems like SF stomping on commit_work.  (There is
the added complication that there are some chrome gpu-process tasks in
between SF and the display, including CrGpuMain (which really doesn't
want to be SCHED_FIFO when executing gl commands on behalf of
something unrelated to the compositor.. the deadline approach, IIUC,
might be the better option eventually for this?)

BR,
-R

>
> Thanks
>
> --
> Qais Yousef

Daniel Vetter Oct. 6, 2020, 9:08 a.m. UTC | #14

On Mon, Oct 05, 2020 at 04:24:38PM -0700, Rob Clark wrote:
> On Mon, Oct 5, 2020 at 8:00 AM Qais Yousef <qais.yousef@arm.com> wrote:
> >
> > +CC Steve and Peter - they might be interested.
> >
> > On 10/02/20 11:07, Rob Clark wrote:
> > > On Fri, Oct 2, 2020 at 4:01 AM Qais Yousef <qais.yousef@arm.com> wrote:
> > > >
> > > > On 09/30/20 14:17, Rob Clark wrote:
> > > > > From: Rob Clark <robdclark@chromium.org>
> > > > >
> > > > > The android userspace treats the display pipeline as a realtime problem.
> > > > > And arguably, if your goal is to not miss frame deadlines (ie. vblank),
> > > > > it is.  (See https://lwn.net/Articles/809545/ for the best explaination
> > > > > that I found.)
> > > > >
> > > > > But this presents a problem with using workqueues for non-blocking
> > > > > atomic commit_work(), because the SCHED_FIFO userspace thread(s) can
> > > > > preempt the worker.  Which is not really the outcome you want.. once
> > > > > the required fences are scheduled, you want to push the atomic commit
> > > > > down to hw ASAP.
> > > >
> > > > For me thees 2 properties
> > > >
> > > >         1. Run ASAP
> > > >         2. Finish the work un-interrupted
> > > >
> > > > Scream the workers need to be SCHED_FIFO by default. CFS can't give you these
> > > > guarantees.
> > >
> > > fwiw, commit_work does sleep/block for some time until fences are
> > > signalled, but then once that happens we want it to run ASAP,
> > > preempting lower priority SCHED_FIFO.
> > >
> > > >
> > > > IMO using sched_set_fifo() for these workers is the right thing.
> > > >
> > >
> > > Possibly, but we still have limited prioritization options (ie. not
> > > enough) to set these from the kernel.  Giving userspace the control,
> > > so it can pick sensible priorities for commit_work and vblank_work,
> > > which fits in with the priorities of the other userspace threads seems
> > > like the sensible thing.
> >
> > The problem is that the kernel can run on all types of systems. It's impossible
> > to pick one value that fits all. Userspace must manage these priorities, and
> > you can still export the TID to help with that.
> >
> > But why do you need several priorities in your pipeline? I would have thought
> > it should execute each stage sequentially and all tasks running at the same RT
> > priority is fine.
> 
> On the kernel side, vblank work should complete during the vblank
> period, making it a harder real time requirement.  So the thinking is
> this should be a higher priority.
> 
> But you are right, if you aren't overcommitted it probably doesn't matter.

vblank work needs to preempt commit work.

Right now we don't have any driver requiring this, but if we e.g. roll out
the gamma table update for i915, then this _has_ to happen in the vblank
period.

Whereas the commit work can happen in there, but it can also be delayed a
bit (until the vblank worker has finished) we will not miss any additional
deadline due to that.

So that's why we have 2 levels. I'm not even sure you can model that with
SCHED_DEADLINE, since essentially we need a few usec of cpu time very
vblank (16ms normally), but thos few usec _must_ be scheduled within a
very specific time slot or we're toast. And that vblank period is only
1-2ms usually.

> > On SMP priorities matter once you've overcomitted the systems. You need to have
> > more RT tasks running than CPUs for priorities to matter. It seems you have
> > a high count of RT tasks in your system?
> >
> > I did some profiles on Android and found that being overcomitted is hard. But
> > that was a while ago.
> >
> > >
> > > > >
> > > > > But the decision of whether commit_work should be RT or not really
> > > > > depends on what userspace is doing.  For a pure CFS userspace display
> > > > > pipeline, commit_work() should remain SCHED_NORMAL.
> > > >
> > > > I'm not sure I agree with this. I think it's better to characterize tasks based
> > > > on their properties/requirements rather than what the rest of the userspace is
> > > > using.
> > >
> > > I mean, the issue is that userspace is already using a few different
> > > rt priority levels for different SF threads.  We want commit_work to
> >
> > Why are they at different priorities? Different priority levels means that some
> > of them have more urgent deadlines to meet and it's okay to steal execution
> > time from lower priority tasks. Is this the case?
> 
> tbh, I'm not fully aware of the background.  It looks like most of the
> SF threads run at priority=2 (100-2==98), and the main one runs at
> priority=1
> 
> > RT planning and partitioning is not easy task for sure. You might want to
> > consider using affinities too to get stronger guarantees for some tasks and
> > prevent cross-talking.
> 
> There is some cgroup stuff that is pinning SF and some other stuff to
> the small cores, fwiw.. I think the reasoning is that they shouldn't
> be doing anything heavy enough to need the big cores.
> 
> > > run ASAP once fences are signalled, and vblank_work to run at a
> > > slightly higher priority still.  But the correct choice for priorities
> > > here depends on what userspace is using, it all needs to fit together
> > > properly.
> >
> > By userspace here I think you mean none display pipeline related RT tasks that
> > you need to coexit with and could still disrupt your pipeline?
> 
> I mean, commit_work should be higher priority than the other (display
> related) RT tasks.  But the kernel doesn't know what those priorities
> are.
> 
> > Using RT on Gerneral Purpose System is hard for sure. One of the major
> > challenge is that there's no admin that has full view of the system to do
> > proper RT planning.
> >
> > We need proper RT balancer daemon that helps partitioning the system for
> > multiple RT apps on these systems..
> >
> > >
> > > >
> > > > I do appreciate that maybe some of these tasks have varying requirements during
> > > > their life time. e.g: they have RT property during specific critical section
> > > > but otherwise are CFS tasks. I think the UI thread in Android behaves like
> > > > that.
> > > >
> > > > It's worth IMO trying that approach I pointed out earlier to see if making RT
> > > > try to pick an idle CPU rather than preempt CFS helps. Not sure if it'd be
> > > > accepted but IMHO it's a better direction to consider and discuss.
> > >
> > > The problem I was seeing was actually the opposite..  commit_work
> > > becomes runnable (fences signalled) but doesn't get a chance to run
> > > because a SCHED_FIFO SF thread is running.  (Maybe I misunderstood and
> > > you're approach would help this case too?)
> >
> > Ah okay. Sorry I got it the wrong way around for some reason. I thought this
> > task is preempting other CFS-based pipelined tasks.
> >
> > So your system seems to be overcomitted. Is SF short for SufraceFlinger? Under
> > what scenarios do you have many SurfaceFlinger tasks? On Android I remember
> > seeing they have priority of 1 or 2.
> 
> yeah, SF==SurfaceFlinger, and yeah, 1 and 2..
> 
> > sched_set_fifo() will use priority 50. If you set all your pipeline tasks
> > to this priority, what happens?
> 
> I think this would work.. drm/msm doesn't use vblank work, so I
> wouldn't really have problems with commit_work preempting vblank_work.
> But I think the best option (and to handle the case if android changes
> the RT priorties around in the future) is to let userspace set the
> priorities.
> 
> > >
> > > > Or maybe you can wrap userspace pipeline critical section lock such that any
> > > > task holding it will automatically be promoted to SCHED_FIFO and then demoted
> > > > to CFS once it releases it.
> > >
> > > The SCHED_DEADLINE + token passing approach that the lwn article
> > > mentioned sounds interesting, if that eventually becomes possible.
> > > But doesn't really help today..
> >
> > We were present in the room with Alessio when he gave that talk :-)
> >
> > You might have seen Valentin's talk in LPC where he's trying to get
> > proxy-execution into shape. Which is a pre-requisite to enable using of
> > SCHED_DEADLINE for these scenarios. IIRC it should allow all dependent tasks to
> > run from the context of the deadline task during the display pipeline critical
> > section.
> >
> > By the way, do you have issues with SoftIrqs delaying your RT tasks execution
> > time?
> 
> I don't *think* so, but I'm not 100% sure if they are showing up in
> traces.  So far it seems like SF stomping on commit_work.  (There is
> the added complication that there are some chrome gpu-process tasks in
> between SF and the display, including CrGpuMain (which really doesn't
> want to be SCHED_FIFO when executing gl commands on behalf of
> something unrelated to the compositor.. the deadline approach, IIUC,
> might be the better option eventually for this?)

deadline has the upshot that it compose much better than SCHED_FIFO:
Everyone just drops their deadline requirements onto the scheduler,
scheduler makes sure it's all obeyed (or rejects your request).

The trouble is we'd need to know how long a commit takes, worst case, on a
given platform. And for that you need to measure stuff, and we kinda can't
spend a few minutes at boot-up going through the combinatorial maze of
atomic commits to make sure we have it all.

So I think in practice letting userspace set the right rt priority/mode is
the only way to go here :-/
-Daniel

> 
> BR,
> -R
> 
> >
> > Thanks
> >
> > --
> > Qais Yousef

Peter Zijlstra Oct. 6, 2020, 10:01 a.m. UTC | #15

On Tue, Oct 06, 2020 at 11:08:59AM +0200, Daniel Vetter wrote:
> vblank work needs to preempt commit work.
> 
> Right now we don't have any driver requiring this, but if we e.g. roll out
> the gamma table update for i915, then this _has_ to happen in the vblank
> period.
> 
> Whereas the commit work can happen in there, but it can also be delayed a
> bit (until the vblank worker has finished) we will not miss any additional
> deadline due to that.
> 
> So that's why we have 2 levels. I'm not even sure you can model that with
> SCHED_DEADLINE, since essentially we need a few usec of cpu time very
> vblank (16ms normally), but thos few usec _must_ be scheduled within a
> very specific time slot or we're toast. And that vblank period is only
> 1-2ms usually.

Depends a bit on what the hardware gets us. If for example we're
provided an interrupt on vblank start, then that could wake a DEADLINE
job with (given your numbers above):

 .sched_period = 16ms,
 .sched_deadline = 1-2ms,
 .sched_runtime = 1-2ms,

The effective utilization of that task would be: 1-2/16.

> deadline has the upshot that it compose much better than SCHED_FIFO:
> Everyone just drops their deadline requirements onto the scheduler,
> scheduler makes sure it's all obeyed (or rejects your request).
> 
> The trouble is we'd need to know how long a commit takes, worst case, on a
> given platform. And for that you need to measure stuff, and we kinda can't
> spend a few minutes at boot-up going through the combinatorial maze of
> atomic commits to make sure we have it all.
> 
> So I think in practice letting userspace set the right rt priority/mode is
> the only way to go here :-/

Or you can have it adjust it's expected runtime as the system runs
(always keeping a little extra room over what you measure to make sure).

Given you have period > deadline, you can simply start with runtime =
deadline and adjust downwards during use (carefully).

Qais Yousef Oct. 6, 2020, 10:59 a.m. UTC | #16

On 10/05/20 16:24, Rob Clark wrote:

[...]

> > RT planning and partitioning is not easy task for sure. You might want to
> > consider using affinities too to get stronger guarantees for some tasks and
> > prevent cross-talking.
> 
> There is some cgroup stuff that is pinning SF and some other stuff to
> the small cores, fwiw.. I think the reasoning is that they shouldn't
> be doing anything heavy enough to need the big cores.

Ah, so you're on big.LITTLE type of system. I have done some work which enables
biasing RT tasks towards big cores and control the default boost value if you
have util_clamp and schedutil enabled. You can use util_clamp in general to
help with DVFS related response time delays.

I haven't done any work to try our best to pick a small core first but fallback
to big if there's no other alternative.

It'd be interesting to know how often you end up on a big core if you remove
the affinity. The RT scheduler picks the first cpu in the lowest priority mask.
So it should have this bias towards picking smaller cores first if they're
in the lower priority mask (ie: not running higher priority RT tasks).

So unless you absolutely don't want any RT tasks on a big cores, it'd be worth
removing this affinity and check the percentage of time you spend on little
cores. This should help with your worst case scenario as you make more cpus
available.

> > > run ASAP once fences are signalled, and vblank_work to run at a
> > > slightly higher priority still.  But the correct choice for priorities
> > > here depends on what userspace is using, it all needs to fit together
> > > properly.
> >
> > By userspace here I think you mean none display pipeline related RT tasks that
> > you need to coexit with and could still disrupt your pipeline?
> 
> I mean, commit_work should be higher priority than the other (display
> related) RT tasks.  But the kernel doesn't know what those priorities
> are.

So if you set commit_work to sched_set_fifo(), it'd be at a reasonably high
priority (50) by default. Which means you just need to manage your SF
priorities without having to change commit_work priority itself?

> 
> > Using RT on Gerneral Purpose System is hard for sure. One of the major
> > challenge is that there's no admin that has full view of the system to do
> > proper RT planning.
> >
> > We need proper RT balancer daemon that helps partitioning the system for
> > multiple RT apps on these systems..
> >
> > >
> > > >
> > > > I do appreciate that maybe some of these tasks have varying requirements during
> > > > their life time. e.g: they have RT property during specific critical section
> > > > but otherwise are CFS tasks. I think the UI thread in Android behaves like
> > > > that.
> > > >
> > > > It's worth IMO trying that approach I pointed out earlier to see if making RT
> > > > try to pick an idle CPU rather than preempt CFS helps. Not sure if it'd be
> > > > accepted but IMHO it's a better direction to consider and discuss.
> > >
> > > The problem I was seeing was actually the opposite..  commit_work
> > > becomes runnable (fences signalled) but doesn't get a chance to run
> > > because a SCHED_FIFO SF thread is running.  (Maybe I misunderstood and
> > > you're approach would help this case too?)
> >
> > Ah okay. Sorry I got it the wrong way around for some reason. I thought this
> > task is preempting other CFS-based pipelined tasks.
> >
> > So your system seems to be overcomitted. Is SF short for SufraceFlinger? Under
> > what scenarios do you have many SurfaceFlinger tasks? On Android I remember
> > seeing they have priority of 1 or 2.
> 
> yeah, SF==SurfaceFlinger, and yeah, 1 and 2..
> 
> > sched_set_fifo() will use priority 50. If you set all your pipeline tasks
> > to this priority, what happens?
> 
> I think this would work.. drm/msm doesn't use vblank work, so I
> wouldn't really have problems with commit_work preempting vblank_work.
> But I think the best option (and to handle the case if android changes
> the RT priorties around in the future) is to let userspace set the
> priorities.

I don't really mind. But it seems better for me if we know that two kernel
threads need to have a specific relative priorities to each others then to
handle this in the kernel properly. Userspace will only need then to worry
about managing its *own* priorities relative to that.

Just seen Peter suggesting in another email to use SCHED_DEADLINE for vblank
work. Which I think achieves the above if commit_work uses sched_set_fifo().

> 
> > >
> > > > Or maybe you can wrap userspace pipeline critical section lock such that any
> > > > task holding it will automatically be promoted to SCHED_FIFO and then demoted
> > > > to CFS once it releases it.
> > >
> > > The SCHED_DEADLINE + token passing approach that the lwn article
> > > mentioned sounds interesting, if that eventually becomes possible.
> > > But doesn't really help today..
> >
> > We were present in the room with Alessio when he gave that talk :-)
> >
> > You might have seen Valentin's talk in LPC where he's trying to get
> > proxy-execution into shape. Which is a pre-requisite to enable using of
> > SCHED_DEADLINE for these scenarios. IIRC it should allow all dependent tasks to
> > run from the context of the deadline task during the display pipeline critical
> > section.
> >
> > By the way, do you have issues with SoftIrqs delaying your RT tasks execution
> > time?
> 
> I don't *think* so, but I'm not 100% sure if they are showing up in

If you ever get a chance to run a high network throughput test, it might help
to see if softirqs are affecting you. I know Android has issues with this under
some circumstances.

> traces.  So far it seems like SF stomping on commit_work.  (There is
> the added complication that there are some chrome gpu-process tasks in
> between SF and the display, including CrGpuMain (which really doesn't
> want to be SCHED_FIFO when executing gl commands on behalf of
> something unrelated to the compositor.. the deadline approach, IIUC,
> might be the better option eventually for this?)

If you meant sched_deadline + token approach, then yeah I think it'd be better.
But as you said, we can't do this yet :/

But as Peter pointed out, this doesn't mean you can't use SCHED_DEADLINE at all
if it does make sense.

Thanks

--
Qais Yousef

Rob Clark Oct. 6, 2020, 8:04 p.m. UTC | #17

On Tue, Oct 6, 2020 at 3:59 AM Qais Yousef <qais.yousef@arm.com> wrote:
>
> On 10/05/20 16:24, Rob Clark wrote:
>
> [...]
>
> > > RT planning and partitioning is not easy task for sure. You might want to
> > > consider using affinities too to get stronger guarantees for some tasks and
> > > prevent cross-talking.
> >
> > There is some cgroup stuff that is pinning SF and some other stuff to
> > the small cores, fwiw.. I think the reasoning is that they shouldn't
> > be doing anything heavy enough to need the big cores.
>
> Ah, so you're on big.LITTLE type of system. I have done some work which enables
> biasing RT tasks towards big cores and control the default boost value if you
> have util_clamp and schedutil enabled. You can use util_clamp in general to
> help with DVFS related response time delays.
>
> I haven't done any work to try our best to pick a small core first but fallback
> to big if there's no other alternative.
>
> It'd be interesting to know how often you end up on a big core if you remove
> the affinity. The RT scheduler picks the first cpu in the lowest priority mask.
> So it should have this bias towards picking smaller cores first if they're
> in the lower priority mask (ie: not running higher priority RT tasks).

fwiw, the issue I'm looking at is actually at the opposite end of the
spectrum, less demanding apps that let cpus throttle down to low
OPPs.. which stretches out the time taken at each step in the path
towards screen (which seems to improve the odds that we hit priority
inversion scenarios with SCHED_FIFO things stomping on important CFS
things)

There is a *big* difference in # of cpu cycles per frame between
highest and lowest OPP..

BR,
-R

> So unless you absolutely don't want any RT tasks on a big cores, it'd be worth
> removing this affinity and check the percentage of time you spend on little
> cores. This should help with your worst case scenario as you make more cpus
> available.
>
> > > > run ASAP once fences are signalled, and vblank_work to run at a
> > > > slightly higher priority still.  But the correct choice for priorities
> > > > here depends on what userspace is using, it all needs to fit together
> > > > properly.
> > >
> > > By userspace here I think you mean none display pipeline related RT tasks that
> > > you need to coexit with and could still disrupt your pipeline?
> >
> > I mean, commit_work should be higher priority than the other (display
> > related) RT tasks.  But the kernel doesn't know what those priorities
> > are.
>
> So if you set commit_work to sched_set_fifo(), it'd be at a reasonably high
> priority (50) by default. Which means you just need to manage your SF
> priorities without having to change commit_work priority itself?
>
> >
> > > Using RT on Gerneral Purpose System is hard for sure. One of the major
> > > challenge is that there's no admin that has full view of the system to do
> > > proper RT planning.
> > >
> > > We need proper RT balancer daemon that helps partitioning the system for
> > > multiple RT apps on these systems..
> > >
> > > >
> > > > >
> > > > > I do appreciate that maybe some of these tasks have varying requirements during
> > > > > their life time. e.g: they have RT property during specific critical section
> > > > > but otherwise are CFS tasks. I think the UI thread in Android behaves like
> > > > > that.
> > > > >
> > > > > It's worth IMO trying that approach I pointed out earlier to see if making RT
> > > > > try to pick an idle CPU rather than preempt CFS helps. Not sure if it'd be
> > > > > accepted but IMHO it's a better direction to consider and discuss.
> > > >
> > > > The problem I was seeing was actually the opposite..  commit_work
> > > > becomes runnable (fences signalled) but doesn't get a chance to run
> > > > because a SCHED_FIFO SF thread is running.  (Maybe I misunderstood and
> > > > you're approach would help this case too?)
> > >
> > > Ah okay. Sorry I got it the wrong way around for some reason. I thought this
> > > task is preempting other CFS-based pipelined tasks.
> > >
> > > So your system seems to be overcomitted. Is SF short for SufraceFlinger? Under
> > > what scenarios do you have many SurfaceFlinger tasks? On Android I remember
> > > seeing they have priority of 1 or 2.
> >
> > yeah, SF==SurfaceFlinger, and yeah, 1 and 2..
> >
> > > sched_set_fifo() will use priority 50. If you set all your pipeline tasks
> > > to this priority, what happens?
> >
> > I think this would work.. drm/msm doesn't use vblank work, so I
> > wouldn't really have problems with commit_work preempting vblank_work.
> > But I think the best option (and to handle the case if android changes
> > the RT priorties around in the future) is to let userspace set the
> > priorities.
>
> I don't really mind. But it seems better for me if we know that two kernel
> threads need to have a specific relative priorities to each others then to
> handle this in the kernel properly. Userspace will only need then to worry
> about managing its *own* priorities relative to that.
>
> Just seen Peter suggesting in another email to use SCHED_DEADLINE for vblank
> work. Which I think achieves the above if commit_work uses sched_set_fifo().
>
> >
> > > >
> > > > > Or maybe you can wrap userspace pipeline critical section lock such that any
> > > > > task holding it will automatically be promoted to SCHED_FIFO and then demoted
> > > > > to CFS once it releases it.
> > > >
> > > > The SCHED_DEADLINE + token passing approach that the lwn article
> > > > mentioned sounds interesting, if that eventually becomes possible.
> > > > But doesn't really help today..
> > >
> > > We were present in the room with Alessio when he gave that talk :-)
> > >
> > > You might have seen Valentin's talk in LPC where he's trying to get
> > > proxy-execution into shape. Which is a pre-requisite to enable using of
> > > SCHED_DEADLINE for these scenarios. IIRC it should allow all dependent tasks to
> > > run from the context of the deadline task during the display pipeline critical
> > > section.
> > >
> > > By the way, do you have issues with SoftIrqs delaying your RT tasks execution
> > > time?
> >
> > I don't *think* so, but I'm not 100% sure if they are showing up in
>
> If you ever get a chance to run a high network throughput test, it might help
> to see if softirqs are affecting you. I know Android has issues with this under
> some circumstances.
>
> > traces.  So far it seems like SF stomping on commit_work.  (There is
> > the added complication that there are some chrome gpu-process tasks in
> > between SF and the display, including CrGpuMain (which really doesn't
> > want to be SCHED_FIFO when executing gl commands on behalf of
> > something unrelated to the compositor.. the deadline approach, IIUC,
> > might be the better option eventually for this?)
>
> If you meant sched_deadline + token approach, then yeah I think it'd be better.
> But as you said, we can't do this yet :/
>
> But as Peter pointed out, this doesn't mean you can't use SCHED_DEADLINE at all
> if it does make sense.
>
> Thanks
>
> --
> Qais Yousef

Qais Yousef Oct. 7, 2020, 10:36 a.m. UTC | #18

On 10/06/20 13:04, Rob Clark wrote:
> On Tue, Oct 6, 2020 at 3:59 AM Qais Yousef <qais.yousef@arm.com> wrote:
> >
> > On 10/05/20 16:24, Rob Clark wrote:
> >
> > [...]
> >
> > > > RT planning and partitioning is not easy task for sure. You might want to
> > > > consider using affinities too to get stronger guarantees for some tasks and
> > > > prevent cross-talking.
> > >
> > > There is some cgroup stuff that is pinning SF and some other stuff to
> > > the small cores, fwiw.. I think the reasoning is that they shouldn't
> > > be doing anything heavy enough to need the big cores.
> >
> > Ah, so you're on big.LITTLE type of system. I have done some work which enables
> > biasing RT tasks towards big cores and control the default boost value if you
> > have util_clamp and schedutil enabled. You can use util_clamp in general to
> > help with DVFS related response time delays.
> >
> > I haven't done any work to try our best to pick a small core first but fallback
> > to big if there's no other alternative.
> >
> > It'd be interesting to know how often you end up on a big core if you remove
> > the affinity. The RT scheduler picks the first cpu in the lowest priority mask.
> > So it should have this bias towards picking smaller cores first if they're
> > in the lower priority mask (ie: not running higher priority RT tasks).
> 
> fwiw, the issue I'm looking at is actually at the opposite end of the
> spectrum, less demanding apps that let cpus throttle down to low
> OPPs.. which stretches out the time taken at each step in the path
> towards screen (which seems to improve the odds that we hit priority
> inversion scenarios with SCHED_FIFO things stomping on important CFS
> things)

So you do have the problem of RT task preempting an important CFS task.

> 
> There is a *big* difference in # of cpu cycles per frame between
> highest and lowest OPP..

To combat DVFS related delays, you can use util clamp.

Hopefully this article helps explain it if you didn't come across it before

	https://lwn.net/Articles/762043/

You can use sched_setattr() to set SCHED_FLAG_UTIL_CLAMP_MIN for a task. This
will guarantee everytime this task is running it'll appear it has at least
this utilization value, so schedutil governor (which must be used for this to
work) will pick up the right performance point (OPP).

The scheduler will try its best to make sure that the task will run on a core
that meets the minimum requested performance point (hinted by setting
uclamp_min).

Thanks

--
Qais Yousef

Rob Clark Oct. 7, 2020, 3:57 p.m. UTC | #19

On Wed, Oct 7, 2020 at 3:36 AM Qais Yousef <qais.yousef@arm.com> wrote:
>
> On 10/06/20 13:04, Rob Clark wrote:
> > On Tue, Oct 6, 2020 at 3:59 AM Qais Yousef <qais.yousef@arm.com> wrote:
> > >
> > > On 10/05/20 16:24, Rob Clark wrote:
> > >
> > > [...]
> > >
> > > > > RT planning and partitioning is not easy task for sure. You might want to
> > > > > consider using affinities too to get stronger guarantees for some tasks and
> > > > > prevent cross-talking.
> > > >
> > > > There is some cgroup stuff that is pinning SF and some other stuff to
> > > > the small cores, fwiw.. I think the reasoning is that they shouldn't
> > > > be doing anything heavy enough to need the big cores.
> > >
> > > Ah, so you're on big.LITTLE type of system. I have done some work which enables
> > > biasing RT tasks towards big cores and control the default boost value if you
> > > have util_clamp and schedutil enabled. You can use util_clamp in general to
> > > help with DVFS related response time delays.
> > >
> > > I haven't done any work to try our best to pick a small core first but fallback
> > > to big if there's no other alternative.
> > >
> > > It'd be interesting to know how often you end up on a big core if you remove
> > > the affinity. The RT scheduler picks the first cpu in the lowest priority mask.
> > > So it should have this bias towards picking smaller cores first if they're
> > > in the lower priority mask (ie: not running higher priority RT tasks).
> >
> > fwiw, the issue I'm looking at is actually at the opposite end of the
> > spectrum, less demanding apps that let cpus throttle down to low
> > OPPs.. which stretches out the time taken at each step in the path
> > towards screen (which seems to improve the odds that we hit priority
> > inversion scenarios with SCHED_FIFO things stomping on important CFS
> > things)
>
> So you do have the problem of RT task preempting an important CFS task.
>
> >
> > There is a *big* difference in # of cpu cycles per frame between
> > highest and lowest OPP..
>
> To combat DVFS related delays, you can use util clamp.
>
> Hopefully this article helps explain it if you didn't come across it before
>
>         https://lwn.net/Articles/762043/
>
> You can use sched_setattr() to set SCHED_FLAG_UTIL_CLAMP_MIN for a task. This
> will guarantee everytime this task is running it'll appear it has at least
> this utilization value, so schedutil governor (which must be used for this to
> work) will pick up the right performance point (OPP).
>
> The scheduler will try its best to make sure that the task will run on a core
> that meets the minimum requested performance point (hinted by setting
> uclamp_min).

Yeah, I think we will end up making some use of uclamp.. there is
someone else working on that angle

But without it, this is a case that exposes legit prioritization
problems with commit_work which we should fix ;-)

BR,
-R

>
> Thanks
>
> --
> Qais Yousef

Qais Yousef Oct. 7, 2020, 4:30 p.m. UTC | #20

On 10/07/20 08:57, Rob Clark wrote:
> Yeah, I think we will end up making some use of uclamp.. there is
> someone else working on that angle
> 
> But without it, this is a case that exposes legit prioritization
> problems with commit_work which we should fix ;-)

I wasn't suggesting this as an alternative to fixing the other problem. But it
seemed you had a different problem here that I thought I could help with :-)

I did give my opinion about how to handle that priority issue. If the 2 threads
are kernel threads and by design they need relative priorities IMO the kernel
need to be taught to set this relative priority. It seemed the vblank worker
could run as SCHED_DEADLINE. If this works, then the priority problem for
commit_work disappears as SCHED_DEADLINE will preempt RT. If commit_work uses
sched_set_fifo(), its priority will be 50, hence your SF threads can no longer
preempt it. And you can manage the SF threads to be any value you want relative
to 50 anyway without having to manage commit_work itself.

I'm not sure if you have problems with RT tasks preempting important CFS
tasks. My brain registered two conflicting statements.

Thanks

--
Qais Yousef

Rob Clark Oct. 7, 2020, 4:44 p.m. UTC | #21

On Mon, Oct 5, 2020 at 5:15 AM Ville Syrjälä
<ville.syrjala@linux.intel.com> wrote:
>
> On Fri, Oct 02, 2020 at 10:55:52AM -0700, Rob Clark wrote:
> > On Fri, Oct 2, 2020 at 4:05 AM Ville Syrjälä
> > <ville.syrjala@linux.intel.com> wrote:
> > >
> > > On Fri, Oct 02, 2020 at 01:52:56PM +0300, Ville Syrjälä wrote:
> > > > On Thu, Oct 01, 2020 at 05:25:55PM +0200, Daniel Vetter wrote:
> > > > > On Thu, Oct 1, 2020 at 5:15 PM Rob Clark <robdclark@gmail.com> wrote:
> > > > > >
> > > > > > I'm leaning towards converting the other drivers over to use the
> > > > > > per-crtc kwork, and then dropping the 'commit_work` from atomic state.
> > > > > > I can add a patch to that, but figured I could postpone that churn
> > > > > > until there is some by-in on this whole idea.
> > > > >
> > > > > i915 has its own commit code, it's not even using the current commit
> > > > > helpers (nor the commit_work). Not sure how much other fun there is.
> > > >
> > > > I don't think we want per-crtc threads for this in i915. Seems
> > > > to me easier to guarantee atomicity across multiple crtcs if
> > > > we just commit them from the same thread.
> > >
> > > Oh, and we may have to commit things in a very specific order
> > > to guarantee the hw doesn't fall over, so yeah definitely per-crtc
> > > thread is a no go.
> >
> > If I'm understanding the i915 code, this is only the case for modeset
> > commits?  I suppose we could achieve the same result by just deciding
> > to pick the kthread of the first CRTC for modeset commits.  I'm not
> > really so much concerned about parallelism for modeset.
>
> I'm not entirely happy about the random differences between modesets
> and other commits. Ideally we wouldn't need any.
>
> Anyways, even if we ignore modesets we still have the issue with
> atomicity guarantees across multiple crtcs. So I think we still
> don't want per-crtc threads, rather it should be thread for each
> commit.

I don't really see any other way to solve the priority inversion other
than per-CRTC kthreads.  I've been thinking about it a bit more, and
my conclusion is:

(1) There isn't really any use for the N+1'th commit to start running
before the kthread_work for the N'th commit completes, so I don't mind
losing the unbound aspect of the workqueue approach
(2) For cases where there does need to be serialization between
commits on different CRTCs, since there is a per-CRTC kthread, you
could achieve this with locking

Since i915 isn't using the atomic helpers here, I suppose it is an
option for i915 to just continue doing what it is doing.

And I could ofc just stop using the atomic commit helper and do the
kthreads thing in msm. But my first preference would be that the
commit helper does generally the right thing.

BR,
-R

Ville Syrjälä Oct. 8, 2020, 8:24 a.m. UTC | #22

On Wed, Oct 07, 2020 at 09:44:09AM -0700, Rob Clark wrote:
> On Mon, Oct 5, 2020 at 5:15 AM Ville Syrjälä
> <ville.syrjala@linux.intel.com> wrote:
> >
> > On Fri, Oct 02, 2020 at 10:55:52AM -0700, Rob Clark wrote:
> > > On Fri, Oct 2, 2020 at 4:05 AM Ville Syrjälä
> > > <ville.syrjala@linux.intel.com> wrote:
> > > >
> > > > On Fri, Oct 02, 2020 at 01:52:56PM +0300, Ville Syrjälä wrote:
> > > > > On Thu, Oct 01, 2020 at 05:25:55PM +0200, Daniel Vetter wrote:
> > > > > > On Thu, Oct 1, 2020 at 5:15 PM Rob Clark <robdclark@gmail.com> wrote:
> > > > > > >
> > > > > > > I'm leaning towards converting the other drivers over to use the
> > > > > > > per-crtc kwork, and then dropping the 'commit_work` from atomic state.
> > > > > > > I can add a patch to that, but figured I could postpone that churn
> > > > > > > until there is some by-in on this whole idea.
> > > > > >
> > > > > > i915 has its own commit code, it's not even using the current commit
> > > > > > helpers (nor the commit_work). Not sure how much other fun there is.
> > > > >
> > > > > I don't think we want per-crtc threads for this in i915. Seems
> > > > > to me easier to guarantee atomicity across multiple crtcs if
> > > > > we just commit them from the same thread.
> > > >
> > > > Oh, and we may have to commit things in a very specific order
> > > > to guarantee the hw doesn't fall over, so yeah definitely per-crtc
> > > > thread is a no go.
> > >
> > > If I'm understanding the i915 code, this is only the case for modeset
> > > commits?  I suppose we could achieve the same result by just deciding
> > > to pick the kthread of the first CRTC for modeset commits.  I'm not
> > > really so much concerned about parallelism for modeset.
> >
> > I'm not entirely happy about the random differences between modesets
> > and other commits. Ideally we wouldn't need any.
> >
> > Anyways, even if we ignore modesets we still have the issue with
> > atomicity guarantees across multiple crtcs. So I think we still
> > don't want per-crtc threads, rather it should be thread for each
> > commit.
> 
> I don't really see any other way to solve the priority inversion other
> than per-CRTC kthreads.

What's the problem with just something like a dedicated commit
thread pool?

> I've been thinking about it a bit more, and
> my conclusion is:
> 
> (1) There isn't really any use for the N+1'th commit to start running
> before the kthread_work for the N'th commit completes, so I don't mind
> losing the unbound aspect of the workqueue approach
> (2) For cases where there does need to be serialization between
> commits on different CRTCs, since there is a per-CRTC kthread, you
> could achieve this with locking
> 
> Since i915 isn't using the atomic helpers here, I suppose it is an
> option for i915 to just continue doing what it is doing.
> 
> And I could ofc just stop using the atomic commit helper and do the
> kthreads thing in msm. But my first preference would be that the
> commit helper does generally the right thing.
> 
> BR,
> -R

Daniel Vetter Oct. 8, 2020, 9:10 a.m. UTC | #23

On Wed, Oct 07, 2020 at 05:30:10PM +0100, Qais Yousef wrote:
> On 10/07/20 08:57, Rob Clark wrote:
> > Yeah, I think we will end up making some use of uclamp.. there is
> > someone else working on that angle
> > 
> > But without it, this is a case that exposes legit prioritization
> > problems with commit_work which we should fix ;-)
> 
> I wasn't suggesting this as an alternative to fixing the other problem. But it
> seemed you had a different problem here that I thought I could help with :-)
> 
> I did give my opinion about how to handle that priority issue. If the 2 threads
> are kernel threads and by design they need relative priorities IMO the kernel
> need to be taught to set this relative priority. It seemed the vblank worker
> could run as SCHED_DEADLINE. If this works, then the priority problem for
> commit_work disappears as SCHED_DEADLINE will preempt RT. If commit_work uses
> sched_set_fifo(), its priority will be 50, hence your SF threads can no longer
> preempt it. And you can manage the SF threads to be any value you want relative
> to 50 anyway without having to manage commit_work itself.
> 
> I'm not sure if you have problems with RT tasks preempting important CFS
> tasks. My brain registered two conflicting statements.

I think the problem is there's two modes cros runs in: Normal cros mode,
which mostly works like a linux desktop. CFS commit work seems fine.

Other mode is android emulation, where we have the surface flinger thread
running at SCHED_FIFO. I think Rob's plan is to runtime switch priorities
to match each use case.
-Daniel

Rob Clark Oct. 16, 2020, 4:27 p.m. UTC | #24

On Thu, Oct 8, 2020 at 1:24 AM Ville Syrjälä
<ville.syrjala@linux.intel.com> wrote:
>
> On Wed, Oct 07, 2020 at 09:44:09AM -0700, Rob Clark wrote:
> > On Mon, Oct 5, 2020 at 5:15 AM Ville Syrjälä
> > <ville.syrjala@linux.intel.com> wrote:
> > >
> > > On Fri, Oct 02, 2020 at 10:55:52AM -0700, Rob Clark wrote:
> > > > On Fri, Oct 2, 2020 at 4:05 AM Ville Syrjälä
> > > > <ville.syrjala@linux.intel.com> wrote:
> > > > >
> > > > > On Fri, Oct 02, 2020 at 01:52:56PM +0300, Ville Syrjälä wrote:
> > > > > > On Thu, Oct 01, 2020 at 05:25:55PM +0200, Daniel Vetter wrote:
> > > > > > > On Thu, Oct 1, 2020 at 5:15 PM Rob Clark <robdclark@gmail.com> wrote:
> > > > > > > >
> > > > > > > > I'm leaning towards converting the other drivers over to use the
> > > > > > > > per-crtc kwork, and then dropping the 'commit_work` from atomic state.
> > > > > > > > I can add a patch to that, but figured I could postpone that churn
> > > > > > > > until there is some by-in on this whole idea.
> > > > > > >
> > > > > > > i915 has its own commit code, it's not even using the current commit
> > > > > > > helpers (nor the commit_work). Not sure how much other fun there is.
> > > > > >
> > > > > > I don't think we want per-crtc threads for this in i915. Seems
> > > > > > to me easier to guarantee atomicity across multiple crtcs if
> > > > > > we just commit them from the same thread.
> > > > >
> > > > > Oh, and we may have to commit things in a very specific order
> > > > > to guarantee the hw doesn't fall over, so yeah definitely per-crtc
> > > > > thread is a no go.
> > > >
> > > > If I'm understanding the i915 code, this is only the case for modeset
> > > > commits?  I suppose we could achieve the same result by just deciding
> > > > to pick the kthread of the first CRTC for modeset commits.  I'm not
> > > > really so much concerned about parallelism for modeset.
> > >
> > > I'm not entirely happy about the random differences between modesets
> > > and other commits. Ideally we wouldn't need any.
> > >
> > > Anyways, even if we ignore modesets we still have the issue with
> > > atomicity guarantees across multiple crtcs. So I think we still
> > > don't want per-crtc threads, rather it should be thread for each
> > > commit.
> >
> > I don't really see any other way to solve the priority inversion other
> > than per-CRTC kthreads.
>
> What's the problem with just something like a dedicated commit
> thread pool?

partly, I was trying to avoid re-implementing workqueue.  And partly
the thread-pool approach seems like it would be harder for userspace
to find the tasks which need priority adjustment.

But each CRTC is essentially a FIFO, pageflip N+1 on a given CRTC will
happen after pageflip N.

BR,
-R

> > I've been thinking about it a bit more, and
> > my conclusion is:
> >
> > (1) There isn't really any use for the N+1'th commit to start running
> > before the kthread_work for the N'th commit completes, so I don't mind
> > losing the unbound aspect of the workqueue approach
> > (2) For cases where there does need to be serialization between
> > commits on different CRTCs, since there is a per-CRTC kthread, you
> > could achieve this with locking
> >
> > Since i915 isn't using the atomic helpers here, I suppose it is an
> > option for i915 to just continue doing what it is doing.
> >
> > And I could ofc just stop using the atomic commit helper and do the
> > kthreads thing in msm. But my first preference would be that the
> > commit helper does generally the right thing.
> >
> > BR,
> > -R
>
> --
> Ville Syrjälä
> Intel

[v2,0/3] drm: commit_work scheduling

Message

Comments