Message ID | 20230912021615.2086698-13-matthew.brost@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | DRM scheduler changes for Xe | expand |
Am 12.09.23 um 04:16 schrieb Matthew Brost: > Provide documentation to guide in ways to teardown an entity. > > Signed-off-by: Matthew Brost <matthew.brost@intel.com> > --- > Documentation/gpu/drm-mm.rst | 6 ++++++ > drivers/gpu/drm/scheduler/sched_entity.c | 19 +++++++++++++++++++ > 2 files changed, 25 insertions(+) > > diff --git a/Documentation/gpu/drm-mm.rst b/Documentation/gpu/drm-mm.rst > index c19b34b1c0ed..cb4d6097897e 100644 > --- a/Documentation/gpu/drm-mm.rst > +++ b/Documentation/gpu/drm-mm.rst > @@ -552,6 +552,12 @@ Overview > .. kernel-doc:: drivers/gpu/drm/scheduler/sched_main.c > :doc: Overview > > +Entity teardown > +--------------- > + > +.. kernel-doc:: drivers/gpu/drm/scheduler/sched_entity.c > + :doc: Entity teardown > + > Scheduler Function References > ----------------------------- > > diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c > index 37557fbb96d0..76f3e10218bb 100644 > --- a/drivers/gpu/drm/scheduler/sched_entity.c > +++ b/drivers/gpu/drm/scheduler/sched_entity.c > @@ -21,6 +21,25 @@ > * > */ > > +/** > + * DOC: Entity teardown > + * > + * Drivers can teardown down an entity for several reasons. Reasons typically > + * are a user closes the entity via an IOCTL, the FD associated with the entity > + * is closed, or the entity encounters an error. The GPU scheduler provides the > + * basic infrastructure to do this in a few different ways. > + * > + * 1. Let the entity run dry (both the pending list and job queue) and then call > + * drm_sched_entity_fini. The backend can accelerate the process of running dry. > + * For example set a flag so run_job is a NOP and set the TDR to a low value to > + * signal all jobs in a timely manner (this example works for > + * DRM_SCHED_POLICY_SINGLE_ENTITY). Please note that it is a requirement from the X server that all externally visible effects of command submission must still be visible even after the fd is closed. This has given us tons amount of headache and is one of the reasons we have the drm_sched_entity_flush() handling in the first place. As long as you don't care about X server compatibility that shouldn't matter to you. Regards, Christian. > + * > + * 2. Kill the entity directly via drm_sched_entity_flush / > + * drm_sched_entity_fini ensuring all pending and queued jobs are off the > + * hardware and signaled. > + */ > + > #include <linux/kthread.h> > #include <linux/slab.h> > #include <linux/completion.h>
On 2023-09-11 22:16, Matthew Brost wrote: > Provide documentation to guide in ways to teardown an entity. > > Signed-off-by: Matthew Brost <matthew.brost@intel.com> > --- > Documentation/gpu/drm-mm.rst | 6 ++++++ > drivers/gpu/drm/scheduler/sched_entity.c | 19 +++++++++++++++++++ > 2 files changed, 25 insertions(+) > > diff --git a/Documentation/gpu/drm-mm.rst b/Documentation/gpu/drm-mm.rst > index c19b34b1c0ed..cb4d6097897e 100644 > --- a/Documentation/gpu/drm-mm.rst > +++ b/Documentation/gpu/drm-mm.rst > @@ -552,6 +552,12 @@ Overview > .. kernel-doc:: drivers/gpu/drm/scheduler/sched_main.c > :doc: Overview > > +Entity teardown > +--------------- > + > +.. kernel-doc:: drivers/gpu/drm/scheduler/sched_entity.c > + :doc: Entity teardown > + > Scheduler Function References > ----------------------------- > > diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c > index 37557fbb96d0..76f3e10218bb 100644 > --- a/drivers/gpu/drm/scheduler/sched_entity.c > +++ b/drivers/gpu/drm/scheduler/sched_entity.c > @@ -21,6 +21,25 @@ > * > */ > > +/** > + * DOC: Entity teardown > + * > + * Drivers can teardown down an entity for several reasons. Reasons typically > + * are a user closes the entity via an IOCTL, the FD associated with the entity > + * is closed, or the entity encounters an error. So in this third case, "entity encounters an error", we need to make sure that no new jobs are being pushed to the entity, or at least say that here. IOW, in all three cases, the common denominator (requirement?) is that no new jobs are being pushed to the entity, i.e. that there are no incoming jobs. > The GPU scheduler provides the > + * basic infrastructure to do this in a few different ways. Well, I'd say "in two different ways." or "in the following two ways." I'd rather have "two" in there to make sure that it is these two, and not any more/less/etc. > + * > + * 1. Let the entity run dry (both the pending list and job queue) and then call > + * drm_sched_entity_fini. The backend can accelerate the process of running dry. > + * For example set a flag so run_job is a NOP and set the TDR to a low value to > + * signal all jobs in a timely manner (this example works for > + * DRM_SCHED_POLICY_SINGLE_ENTITY). > + * > + * 2. Kill the entity directly via drm_sched_entity_flush / > + * drm_sched_entity_fini ensuring all pending and queued jobs are off the > + * hardware and signaled. > + */ > + > #include <linux/kthread.h> > #include <linux/slab.h> > #include <linux/completion.h>
On 9/12/23 04:16, Matthew Brost wrote: > Provide documentation to guide in ways to teardown an entity. > > Signed-off-by: Matthew Brost <matthew.brost@intel.com> > --- > Documentation/gpu/drm-mm.rst | 6 ++++++ > drivers/gpu/drm/scheduler/sched_entity.c | 19 +++++++++++++++++++ > 2 files changed, 25 insertions(+) > > diff --git a/Documentation/gpu/drm-mm.rst b/Documentation/gpu/drm-mm.rst > index c19b34b1c0ed..cb4d6097897e 100644 > --- a/Documentation/gpu/drm-mm.rst > +++ b/Documentation/gpu/drm-mm.rst > @@ -552,6 +552,12 @@ Overview > .. kernel-doc:: drivers/gpu/drm/scheduler/sched_main.c > :doc: Overview > > +Entity teardown > +--------------- While I think it is good to document this as well, my concern was more about tearing down the drm_gpu_scheduler. (See also my response to patch 11 of this series.) How do we ensure that the pending_list is actually empty before calling drm_sched_fini()? If we don't, we potentially leak memory. For instance, we could let drm_sched_fini() (or a separate drm_sched_teardown()) cancel run work first and leave free work running until the pending_list is empty. If we think drivers should take care themselves (e.g. through reference counting jobs per scheduler), we should document this and explain why we can't have the scheduler do this for us. > + > +.. kernel-doc:: drivers/gpu/drm/scheduler/sched_entity.c > + :doc: Entity teardown > + > Scheduler Function References > ----------------------------- > > diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c > index 37557fbb96d0..76f3e10218bb 100644 > --- a/drivers/gpu/drm/scheduler/sched_entity.c > +++ b/drivers/gpu/drm/scheduler/sched_entity.c > @@ -21,6 +21,25 @@ > * > */ > > +/** > + * DOC: Entity teardown > + * > + * Drivers can teardown down an entity for several reasons. Reasons typically > + * are a user closes the entity via an IOCTL, the FD associated with the entity > + * is closed, or the entity encounters an error. The GPU scheduler provides the > + * basic infrastructure to do this in a few different ways. > + * > + * 1. Let the entity run dry (both the pending list and job queue) and then call > + * drm_sched_entity_fini. The backend can accelerate the process of running dry. > + * For example set a flag so run_job is a NOP and set the TDR to a low value to > + * signal all jobs in a timely manner (this example works for > + * DRM_SCHED_POLICY_SINGLE_ENTITY). > + * > + * 2. Kill the entity directly via drm_sched_entity_flush / > + * drm_sched_entity_fini ensuring all pending and queued jobs are off the > + * hardware and signaled. > + */ > + > #include <linux/kthread.h> > #include <linux/slab.h> > #include <linux/completion.h>
diff --git a/Documentation/gpu/drm-mm.rst b/Documentation/gpu/drm-mm.rst index c19b34b1c0ed..cb4d6097897e 100644 --- a/Documentation/gpu/drm-mm.rst +++ b/Documentation/gpu/drm-mm.rst @@ -552,6 +552,12 @@ Overview .. kernel-doc:: drivers/gpu/drm/scheduler/sched_main.c :doc: Overview +Entity teardown +--------------- + +.. kernel-doc:: drivers/gpu/drm/scheduler/sched_entity.c + :doc: Entity teardown + Scheduler Function References ----------------------------- diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c index 37557fbb96d0..76f3e10218bb 100644 --- a/drivers/gpu/drm/scheduler/sched_entity.c +++ b/drivers/gpu/drm/scheduler/sched_entity.c @@ -21,6 +21,25 @@ * */ +/** + * DOC: Entity teardown + * + * Drivers can teardown down an entity for several reasons. Reasons typically + * are a user closes the entity via an IOCTL, the FD associated with the entity + * is closed, or the entity encounters an error. The GPU scheduler provides the + * basic infrastructure to do this in a few different ways. + * + * 1. Let the entity run dry (both the pending list and job queue) and then call + * drm_sched_entity_fini. The backend can accelerate the process of running dry. + * For example set a flag so run_job is a NOP and set the TDR to a low value to + * signal all jobs in a timely manner (this example works for + * DRM_SCHED_POLICY_SINGLE_ENTITY). + * + * 2. Kill the entity directly via drm_sched_entity_flush / + * drm_sched_entity_fini ensuring all pending and queued jobs are off the + * hardware and signaled. + */ + #include <linux/kthread.h> #include <linux/slab.h> #include <linux/completion.h>
Provide documentation to guide in ways to teardown an entity. Signed-off-by: Matthew Brost <matthew.brost@intel.com> --- Documentation/gpu/drm-mm.rst | 6 ++++++ drivers/gpu/drm/scheduler/sched_entity.c | 19 +++++++++++++++++++ 2 files changed, 25 insertions(+)