Message ID | 20231113123832.120710-1-christian.koenig@amd.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | drm/scheduler: improve GPU scheduler documentation | expand |
Hi Christian, On 11/13/23 13:38, Christian König wrote: > Start to improve the scheduler document. Especially document the > lifetime of each of the objects as well as the restrictions around > DMA-fence handling and userspace compatibility. Thanks a lot for submitting this - it's very much appreciated! Before reviewing in detail, do you mind to re-structure this a little bit? Instead of packing everything in an enumeration I'd suggest to have separate DOC paragraphs. For instance: - keep "Overview" to introduce the overall idea and basic structures of the component - a paragraph for each of those basic structures (drm_gpu_scheduler, drm_sched_entity, drm_sched_fence) explaining their purpose and lifetime - a paragraph about the pitfalls dealing with DMA fences - a paragraph about the pitfalls of the driver callbacks (although this might highly intersect with the previous suggested one) I feel like this would be much easier to read. Besides that, which covers the conceptual side of things, I think we also need to improve the documentation on what the scheduler implementation expects from drivers, e.g. zero initialize structures, valid initialization parameters for typical use cases, etc. However, that's for a separate patch. - Danilo > > Signed-off-by: Christian König <christian.koenig@amd.com> > --- > drivers/gpu/drm/scheduler/sched_main.c | 126 ++++++++++++++++++++----- > 1 file changed, 104 insertions(+), 22 deletions(-) > > diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c > index 506371c42745..36a7c5dc852d 100644 > --- a/drivers/gpu/drm/scheduler/sched_main.c > +++ b/drivers/gpu/drm/scheduler/sched_main.c > @@ -24,28 +24,110 @@ > /** > * DOC: Overview > * > - * The GPU scheduler provides entities which allow userspace to push jobs > - * into software queues which are then scheduled on a hardware run queue. > - * The software queues have a priority among them. The scheduler selects the entities > - * from the run queue using a FIFO. The scheduler provides dependency handling > - * features among jobs. The driver is supposed to provide callback functions for > - * backend operations to the scheduler like submitting a job to hardware run queue, > - * returning the dependencies of a job etc. > - * > - * The organisation of the scheduler is the following: > - * > - * 1. Each hw run queue has one scheduler > - * 2. Each scheduler has multiple run queues with different priorities > - * (e.g., HIGH_HW,HIGH_SW, KERNEL, NORMAL) > - * 3. Each scheduler run queue has a queue of entities to schedule > - * 4. Entities themselves maintain a queue of jobs that will be scheduled on > - * the hardware. > - * > - * The jobs in a entity are always scheduled in the order that they were pushed. > - * > - * Note that once a job was taken from the entities queue and pushed to the > - * hardware, i.e. the pending queue, the entity must not be referenced anymore > - * through the jobs entity pointer. > + * The GPU scheduler implements some logic to decide which command submission > + * to push next to the hardware. Another major use case for the GPU scheduler > + * is to enforce correct driver behavior around those command submission. > + * Because of this it's also used by drivers which don't need the actual > + * scheduling functionality. > + * > + * To fulfill this task the GPU scheduler uses of the following objects: > + * > + * 1. The job object which contains a bunch of dependencies in the form of > + * DMA-fence objects. Drivers can also implement an optional prepare_job > + * callback which returns additional dependencies as DMA-fence objects. > + * It's important to note that this callback must follow the DMA-fence rules, > + * so it can't easily allocate memory or grab locks under which memory is > + * allocated. Drivers should use this as base class for an object which > + * contains the necessary state to push the command submission to the > + * hardware. > + * > + * The lifetime of the job object should at least be from pushing it into the > + * scheduler until the scheduler notes through the free callback that a job > + * isn't needed any more. Drivers can of course keep their job object alive > + * longer than that, but that's outside of the scope of the scheduler > + * component. Job initialization is split into two parts, > + * drm_sched_job_init() and drm_sched_job_arm(). It's important to note that > + * after arming a job drivers must follow the DMA-fence rules and can't > + * easily allocate memory or takes locks under which memory is allocated. > + * > + * 2. The entity object which is a container for jobs which should execute > + * sequentially. Drivers should create an entity for each individual context > + * they maintain for command submissions which can run in parallel. > + * > + * The lifetime of the entity should *not* exceed the lifetime of the > + * userspace process it was created for and drivers should call the > + * drm_sched_entity_flush() function from their file_operations.flush > + * callback. Background is that for compatibility reasons with existing > + * userspace all results of a command submission should become visible > + * externally even after after a process exits. The only exception to that > + * is when the process is actively killed by a SIGKILL. In this case the > + * entity object makes sure that jobs are freed without running them while > + * still maintaining correct sequential order for signaling fences. So it's > + * possible that an entity object is not alive any more while jobs from it > + * are still running on the hardware. > + * > + * 3. The hardware fence object which is a DMA-fence provided by the driver as > + * result of running jobs. Drivers need to make sure that the normal > + * DMA-fence semantics are followed for this object. It's important to note > + * that the memory for this object can *not* be allocated in the run_job > + * callback since that would violate the requirements for the DMA-fence > + * implementation. The scheduler maintains a timeout handler which triggers > + * if this fence doesn't signal in a configurable time frame. > + * > + * The lifetime of this object follows DMA-fence ref-counting rules, the > + * scheduler takes ownership of the reference returned by the driver and > + * drops it when it's not needed any more. Errors should also be signaled > + * through the hardware fence and are bubbled up back to the scheduler fence > + * and entity. > + * > + * 4. The scheduler fence object which encapsulates the whole time from pushing > + * the job into the scheduler until the hardware has finished processing it. > + * This is internally managed by the scheduler, but drivers can grab > + * additional reference to it after arming a job. The implementation > + * provides DMA-fence interfaces for signaling both scheduling of a command > + * submission as well as finishing of processing. > + * > + * The lifetime of this object also follows normal DMA-fence ref-counting > + * rules. The finished fence is the one normally exposed outside of the > + * scheduler, but the driver can grab references to both the scheduled as > + * well as the finished fence when needed for pipe-lining optimizations. > + * > + * 5. The run queue object which is a container of entities for a certain > + * priority level. The lifetime of those objects are bound to the scheduler > + * lifetime. > + * > + * This is internally managed by the scheduler and drivers shouldn't touch > + * them directly. > + * > + * 6. The scheduler object itself which does the actual work of selecting a job > + * and pushing it to the hardware. Both FIFO and RR selection algorithm are > + * supported, but FIFO is preferred for many use cases. > + * > + * The lifetime of this object is managed by the driver using it. Before > + * destroying the scheduler the driver must ensure that all hardware > + * processing involving this scheduler object has finished by calling for > + * example disable_irq(). It is *not* sufficient to wait for the hardware > + * fence here since this doesn't guarantee that all callback processing has > + * finished. > + * > + * All callbacks the driver needs to implement are restricted by DMA-fence > + * signaling rules to guarantee deadlock free forward progress. This especially > + * means that for normal operation no memory can be allocated. All memory which > + * is needed for pushing the job to the hardware must be allocated before > + * arming a job. It also means that no locks can be taken under which memory > + * might be allocated as well. > + * > + * Memory which is optional to allocate for device core dumping or debugging > + * *must* be allocated with GFP_NOWAIT and appropriate error handling taking if > + * that allocation fails. GFP_ATOMIC should only be used if absolutely > + * necessary since dipping into the special atomic reserves is usually not > + * justified for a GPU driver. > + * > + * The scheduler also used to provided functionality for re-submitting jobs > + * with replacing the hardware fence during reset handling. This functionality > + * is now marked as deprecated since this has proven to be fundamentally racy > + * and not compatible with DMA-fence rules and shouldn't be used in any new > + * code. > */ > > #include <linux/kthread.h>
Am 13.11.23 um 14:14 schrieb Danilo Krummrich: > Hi Christian, > > On 11/13/23 13:38, Christian König wrote: >> Start to improve the scheduler document. Especially document the >> lifetime of each of the objects as well as the restrictions around >> DMA-fence handling and userspace compatibility. > > Thanks a lot for submitting this - it's very much appreciated! > > Before reviewing in detail, do you mind to re-structure this a little bit? Not the slightest. I'm not a native speaker of English and generally not very good at writing documentation. > Instead > of packing everything in an enumeration I'd suggest to have separate > DOC paragraphs. > > For instance: > > - keep "Overview" to introduce the overall idea and basic structures > of the component > - a paragraph for each of those basic structures (drm_gpu_scheduler, > drm_sched_entity, > drm_sched_fence) explaining their purpose and lifetime > - a paragraph about the pitfalls dealing with DMA fences > - a paragraph about the pitfalls of the driver callbacks (although > this might highly > intersect with the previous suggested one) > > I feel like this would be much easier to read. Going to give that a try. > > Besides that, which covers the conceptual side of things, I think we > also need to > improve the documentation on what the scheduler implementation expects > from drivers, > e.g. zero initialize structures, valid initialization parameters for > typical use cases, > etc. However, that's for a separate patch. Yeah, each individual function should have kerneldoc attached to it. I think we should also try to deprecate more of the hacks AMD came up. Especially the error and GPU reset handling is more than messed up. Regards, Christian. > > - Danilo > >> >> Signed-off-by: Christian König <christian.koenig@amd.com> >> --- >> drivers/gpu/drm/scheduler/sched_main.c | 126 ++++++++++++++++++++----- >> 1 file changed, 104 insertions(+), 22 deletions(-) >> >> diff --git a/drivers/gpu/drm/scheduler/sched_main.c >> b/drivers/gpu/drm/scheduler/sched_main.c >> index 506371c42745..36a7c5dc852d 100644 >> --- a/drivers/gpu/drm/scheduler/sched_main.c >> +++ b/drivers/gpu/drm/scheduler/sched_main.c >> @@ -24,28 +24,110 @@ >> /** >> * DOC: Overview >> * >> - * The GPU scheduler provides entities which allow userspace to push >> jobs >> - * into software queues which are then scheduled on a hardware run >> queue. >> - * The software queues have a priority among them. The scheduler >> selects the entities >> - * from the run queue using a FIFO. The scheduler provides >> dependency handling >> - * features among jobs. The driver is supposed to provide callback >> functions for >> - * backend operations to the scheduler like submitting a job to >> hardware run queue, >> - * returning the dependencies of a job etc. >> - * >> - * The organisation of the scheduler is the following: >> - * >> - * 1. Each hw run queue has one scheduler >> - * 2. Each scheduler has multiple run queues with different priorities >> - * (e.g., HIGH_HW,HIGH_SW, KERNEL, NORMAL) >> - * 3. Each scheduler run queue has a queue of entities to schedule >> - * 4. Entities themselves maintain a queue of jobs that will be >> scheduled on >> - * the hardware. >> - * >> - * The jobs in a entity are always scheduled in the order that they >> were pushed. >> - * >> - * Note that once a job was taken from the entities queue and pushed >> to the >> - * hardware, i.e. the pending queue, the entity must not be >> referenced anymore >> - * through the jobs entity pointer. >> + * The GPU scheduler implements some logic to decide which command >> submission >> + * to push next to the hardware. Another major use case for the GPU >> scheduler >> + * is to enforce correct driver behavior around those command >> submission. >> + * Because of this it's also used by drivers which don't need the >> actual >> + * scheduling functionality. >> + * >> + * To fulfill this task the GPU scheduler uses of the following >> objects: >> + * >> + * 1. The job object which contains a bunch of dependencies in the >> form of >> + * DMA-fence objects. Drivers can also implement an optional >> prepare_job >> + * callback which returns additional dependencies as DMA-fence >> objects. >> + * It's important to note that this callback must follow the >> DMA-fence rules, >> + * so it can't easily allocate memory or grab locks under which >> memory is >> + * allocated. Drivers should use this as base class for an object >> which >> + * contains the necessary state to push the command submission to >> the >> + * hardware. >> + * >> + * The lifetime of the job object should at least be from pushing >> it into the >> + * scheduler until the scheduler notes through the free callback >> that a job >> + * isn't needed any more. Drivers can of course keep their job >> object alive >> + * longer than that, but that's outside of the scope of the >> scheduler >> + * component. Job initialization is split into two parts, >> + * drm_sched_job_init() and drm_sched_job_arm(). It's important >> to note that >> + * after arming a job drivers must follow the DMA-fence rules and >> can't >> + * easily allocate memory or takes locks under which memory is >> allocated. >> + * >> + * 2. The entity object which is a container for jobs which should >> execute >> + * sequentially. Drivers should create an entity for each >> individual context >> + * they maintain for command submissions which can run in parallel. >> + * >> + * The lifetime of the entity should *not* exceed the lifetime of >> the >> + * userspace process it was created for and drivers should call the >> + * drm_sched_entity_flush() function from their >> file_operations.flush >> + * callback. Background is that for compatibility reasons with >> existing >> + * userspace all results of a command submission should become >> visible >> + * externally even after after a process exits. The only >> exception to that >> + * is when the process is actively killed by a SIGKILL. In this >> case the >> + * entity object makes sure that jobs are freed without running >> them while >> + * still maintaining correct sequential order for signaling >> fences. So it's >> + * possible that an entity object is not alive any more while >> jobs from it >> + * are still running on the hardware. >> + * >> + * 3. The hardware fence object which is a DMA-fence provided by the >> driver as >> + * result of running jobs. Drivers need to make sure that the normal >> + * DMA-fence semantics are followed for this object. It's >> important to note >> + * that the memory for this object can *not* be allocated in the >> run_job >> + * callback since that would violate the requirements for the >> DMA-fence >> + * implementation. The scheduler maintains a timeout handler >> which triggers >> + * if this fence doesn't signal in a configurable time frame. >> + * >> + * The lifetime of this object follows DMA-fence ref-counting >> rules, the >> + * scheduler takes ownership of the reference returned by the >> driver and >> + * drops it when it's not needed any more. Errors should also be >> signaled >> + * through the hardware fence and are bubbled up back to the >> scheduler fence >> + * and entity. >> + * >> + * 4. The scheduler fence object which encapsulates the whole time >> from pushing >> + * the job into the scheduler until the hardware has finished >> processing it. >> + * This is internally managed by the scheduler, but drivers can grab >> + * additional reference to it after arming a job. The implementation >> + * provides DMA-fence interfaces for signaling both scheduling of >> a command >> + * submission as well as finishing of processing. >> + * >> + * The lifetime of this object also follows normal DMA-fence >> ref-counting >> + * rules. The finished fence is the one normally exposed outside >> of the >> + * scheduler, but the driver can grab references to both the >> scheduled as >> + * well as the finished fence when needed for pipe-lining >> optimizations. >> + * >> + * 5. The run queue object which is a container of entities for a >> certain >> + * priority level. The lifetime of those objects are bound to the >> scheduler >> + * lifetime. >> + * >> + * This is internally managed by the scheduler and drivers >> shouldn't touch >> + * them directly. >> + * >> + * 6. The scheduler object itself which does the actual work of >> selecting a job >> + * and pushing it to the hardware. Both FIFO and RR selection >> algorithm are >> + * supported, but FIFO is preferred for many use cases. >> + * >> + * The lifetime of this object is managed by the driver using it. >> Before >> + * destroying the scheduler the driver must ensure that all hardware >> + * processing involving this scheduler object has finished by >> calling for >> + * example disable_irq(). It is *not* sufficient to wait for the >> hardware >> + * fence here since this doesn't guarantee that all callback >> processing has >> + * finished. >> + * >> + * All callbacks the driver needs to implement are restricted by >> DMA-fence >> + * signaling rules to guarantee deadlock free forward progress. This >> especially >> + * means that for normal operation no memory can be allocated. All >> memory which >> + * is needed for pushing the job to the hardware must be allocated >> before >> + * arming a job. It also means that no locks can be taken under >> which memory >> + * might be allocated as well. >> + * >> + * Memory which is optional to allocate for device core dumping or >> debugging >> + * *must* be allocated with GFP_NOWAIT and appropriate error >> handling taking if >> + * that allocation fails. GFP_ATOMIC should only be used if absolutely >> + * necessary since dipping into the special atomic reserves is >> usually not >> + * justified for a GPU driver. >> + * >> + * The scheduler also used to provided functionality for >> re-submitting jobs >> + * with replacing the hardware fence during reset handling. This >> functionality >> + * is now marked as deprecated since this has proven to be >> fundamentally racy >> + * and not compatible with DMA-fence rules and shouldn't be used in >> any new >> + * code. >> */ >> #include <linux/kthread.h> >
On 2023-11-13 07:38, Christian König wrote: > Start to improve the scheduler document. Especially document the > lifetime of each of the objects as well as the restrictions around > DMA-fence handling and userspace compatibility. Thanks Christian for doing this--much needed. > > Signed-off-by: Christian König <christian.koenig@amd.com> > --- > drivers/gpu/drm/scheduler/sched_main.c | 126 ++++++++++++++++++++----- > 1 file changed, 104 insertions(+), 22 deletions(-) > > diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c > index 506371c42745..36a7c5dc852d 100644 > --- a/drivers/gpu/drm/scheduler/sched_main.c > +++ b/drivers/gpu/drm/scheduler/sched_main.c > @@ -24,28 +24,110 @@ > /** > * DOC: Overview > * > - * The GPU scheduler provides entities which allow userspace to push jobs > - * into software queues which are then scheduled on a hardware run queue. > - * The software queues have a priority among them. The scheduler selects the entities > - * from the run queue using a FIFO. The scheduler provides dependency handling > - * features among jobs. The driver is supposed to provide callback functions for > - * backend operations to the scheduler like submitting a job to hardware run queue, > - * returning the dependencies of a job etc. So, I don't mind this paragraph, as it provides an overview or the relationship between a DRM GPU scheduler, entities, run-queues, and jobs. > - * > - * The organisation of the scheduler is the following: > - * > - * 1. Each hw run queue has one scheduler > - * 2. Each scheduler has multiple run queues with different priorities > - * (e.g., HIGH_HW,HIGH_SW, KERNEL, NORMAL) > - * 3. Each scheduler run queue has a queue of entities to schedule > - * 4. Entities themselves maintain a queue of jobs that will be scheduled on > - * the hardware. This is also good, and shouldn't have been deleted. > - * > - * The jobs in a entity are always scheduled in the order that they were pushed. I'd have said here "jobs within an entity". Again shouldn't have been deleted. This is good overall/overview information. > - * > - * Note that once a job was taken from the entities queue and pushed to the > - * hardware, i.e. the pending queue, the entity must not be referenced anymore > - * through the jobs entity pointer. > + * The GPU scheduler implements some logic to decide which command submission > + * to push next to the hardware. Another major use case for the GPU scheduler > + * is to enforce correct driver behavior around those command submission. The GPU scheduler also enforces correct driver behaviour around those command submissions. > + * Because of this it's also used by drivers which don't need the actual > + * scheduling functionality. ... but need to push jobs into their firmware/hardware and maintain/keep correct DRM dependencies in the form of "fences". > + * > + * To fulfill this task the GPU scheduler uses of the following objects: > + * > + * 1. The job object which contains a bunch of dependencies in the form of Drop "which". Instead of listing what it contains, how it is used, what it does, explain what it is: work/task to be executed by the GPU. _Then_ you can start listing what it contains, how it is used, what it does. > + * DMA-fence objects. Drivers can also implement an optional prepare_job > + * callback which returns additional dependencies as DMA-fence objects. "can also"? This would usually follow if the other callbacks/etc., have been described and they haven't, so I'd say "Drivers implement an optional prepare_job callback,..." > + * It's important to note that this callback must follow the DMA-fence rules, > + * so it can't easily allocate memory or grab locks under which memory is > + * allocated. Drivers should use this as base class for an object which > + * contains the necessary state to push the command submission to the > + * hardware. > + * > + * The lifetime of the job object should at least be from pushing it into the > + * scheduler until the scheduler notes through the free callback that a job > + * isn't needed any more. Drivers can of course keep their job object alive > + * longer than that, but that's outside of the scope of the scheduler > + * component. [New paragraph starts describing the job initialization.] Add: Job initialization is split into two parts, > + * drm_sched_job_init() and drm_sched_job_arm(). Perhaps we should mention briefly what each one does..? Add: It's important to note that > + * after arming a job drivers must follow the DMA-fence rules and can't > + * easily allocate memory or takes locks under which memory is allocated. > + * > + * 2. The entity object which is a container for jobs which should execute Drop "which". "The entity object is a container of ..." > + * sequentially. Drivers should create an entity for each individual context > + * they maintain for command submissions which can run in parallel. This isn't very clear, but can be made so by: "they maintain for command submissions." "Entities' jobs can run in parallel as facilitated by the GPU." > + * > + * The lifetime of the entity should *not* exceed the lifetime of the > + * userspace process it was created for and drivers should call the > + * drm_sched_entity_flush() function from their file_operations.flush > + * callback. Okay... they should... WHEN? When we use "should do something" we always clarify with "when xyz happens." Add: Background is that for compatibility reasons with existing > + * userspace all results of a command submission should become visible "Background" --> "The reason for this is for compatibility with existing ..." > + * externally even after after a process exits. The only exception to that Remove one of the two "after". > + * is when the process is actively killed by a SIGKILL. In this case the > + * entity object makes sure that jobs are freed without running them while > + * still maintaining correct sequential order for signaling fences. So it's > + * possible that an entity object is not alive any more while jobs from it "to not be alive" (This paragraph here, including SIGKILL, could be made clearer, by splitting it in two parts. One describing normal behaviour, i.e. SIGTERM, and the other describing SIGKILL.) > + * are still running on the hardware. > + * > + * 3. The hardware fence object which is a DMA-fence provided by the driver as Drop "which". "The hardware fence object is a DMA-fence which is provided by the driver as _a_ result of running a job/jobs." > + * result of running jobs. Drivers need to make sure that the normal > + * DMA-fence semantics are followed for this object. It's important to note "DMA-fence semantics" are mentioned several times so far, and a link to a description of said semantics (or NULL if none is in the kernel)--would be nice to put. > + * that the memory for this object can *not* be allocated in the run_job > + * callback since that would violate the requirements for the DMA-fence > + * implementation. Is it "no allocation" or just "allocation for this object" in run_job? (Or maybe no kernel allocation..., we should probably clarify this.) Add: The scheduler maintains a timeout handler which triggers > + * if this fence doesn't signal in a configurable time frame. > + * > + * The lifetime of this object follows DMA-fence ref-counting rules, the > + * scheduler takes ownership of the reference returned by the driver and > + * drops it when it's not needed any more. Errors should also be signaled > + * through the hardware fence and are bubbled up back to the scheduler fence By whom? "through the hardware fence by the driver, and are ..." > + * and entity. > + * > + * 4. The scheduler fence object which encapsulates the whole time from pushing > + * the job into the scheduler until the hardware has finished processing it. Perhaps drop "time encapsulation." It's not very clear what you want to say here. Perhaps, use "exists" and drop "which", as in: 4. The scheduler fence object exists/is ref-ed by DRM, etc., from the time when the job is pushed into the scheduler until the hardware has finished with it. If fence signaling is involved in those two steps, it should be noted here. If this is about ownership, it should be simply stated. > + * This is internally managed by the scheduler, but drivers can grab > + * additional reference to it after arming a job. The implementation Instead of "implementation" perhaps use "DRM scheduler code?" > + * provides DMA-fence interfaces for signaling both scheduling of a command > + * submission as well as finishing of processing. I'd clarify with "provides DMA-fence interfaces for drivers, for the scheduling of a command submission, akin to the start of a command, and finishing command processing." Perhaps we can also mention when drivers are supposed to call these...? > + * > + * The lifetime of this object also follows normal DMA-fence ref-counting > + * rules. The finished fence is the one normally exposed outside of the > + * scheduler, but the driver can grab references to both the scheduled as > + * well as the finished fence when needed for pipe-lining optimizations. > + * > + * 5. The run queue object which is a container of entities for a certain > + * priority level. The lifetime of those objects are bound to the scheduler > + * lifetime. "which is" --> "is" "entities for a certain" --> "entities of a certain" > + * > + * This is internally managed by the scheduler and drivers shouldn't touch > + * them directly. > + * > + * 6. The scheduler object itself which does the actual work of selecting a job > + * and pushing it to the hardware. Both FIFO and RR selection algorithm are > + * supported, but FIFO is preferred for many use cases. Let's drop "which" and just say "The scheduler object does the actual work of ..." > + * > + * The lifetime of this object is managed by the driver using it. Before > + * destroying the scheduler the driver must ensure that all hardware > + * processing involving this scheduler object has finished by calling for > + * example disable_irq(). It is *not* sufficient to wait for the hardware > + * fence here since this doesn't guarantee that all callback processing has > + * finished. Okay, and perhaps we could mention drm_sched_fini() here. > + * > + * All callbacks the driver needs to implement are restricted by DMA-fence > + * signaling rules to guarantee deadlock free forward progress. This especially > + * means that for normal operation no memory can be allocated. All memory which Need a pointer to the "DMA-fence signaling rules" and also need to define "normal operation", or clarify it in the sentence. > + * is needed for pushing the job to the hardware must be allocated before > + * arming a job. It also means that no locks can be taken under which memory > + * might be allocated as well. Yes, that makes sense. I think this is generally the case for driver and firmware writers, and I'd think is common sense, but it is good to have it in writing. > + * > + * Memory which is optional to allocate for device core dumping or debugging > + * *must* be allocated with GFP_NOWAIT and appropriate error handling taking if > + * that allocation fails. GFP_ATOMIC should only be used if absolutely > + * necessary since dipping into the special atomic reserves is usually not > + * justified for a GPU driver. Yes, that's well said. Good to have it here. > + * > + * The scheduler also used to provided functionality for re-submitting jobs "The scheduler also used ..." --> "The scheduler is also used ..." > + * with replacing the hardware fence during reset handling. This functionality > + * is now marked as deprecated since this has proven to be fundamentally racy > + * and not compatible with DMA-fence rules and shouldn't be used in any new > + * code. I wouldn't use a contraction here. In order to emphasize that job re-submission is depreciated, I'd use: "and should not be used in any new code." > */ > > #include <linux/kthread.h>
Am 17.11.23 um 04:18 schrieb Luben Tuikov: > On 2023-11-13 07:38, Christian König wrote: >> Start to improve the scheduler document. Especially document the >> lifetime of each of the objects as well as the restrictions around >> DMA-fence handling and userspace compatibility. > Thanks Christian for doing this--much needed. > >> Signed-off-by: Christian König <christian.koenig@amd.com> >> --- >> drivers/gpu/drm/scheduler/sched_main.c | 126 ++++++++++++++++++++----- >> 1 file changed, 104 insertions(+), 22 deletions(-) >> >> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c >> index 506371c42745..36a7c5dc852d 100644 >> --- a/drivers/gpu/drm/scheduler/sched_main.c >> +++ b/drivers/gpu/drm/scheduler/sched_main.c >> @@ -24,28 +24,110 @@ >> /** >> * DOC: Overview >> * >> - * The GPU scheduler provides entities which allow userspace to push jobs >> - * into software queues which are then scheduled on a hardware run queue. >> - * The software queues have a priority among them. The scheduler selects the entities >> - * from the run queue using a FIFO. The scheduler provides dependency handling >> - * features among jobs. The driver is supposed to provide callback functions for >> - * backend operations to the scheduler like submitting a job to hardware run queue, >> - * returning the dependencies of a job etc. > So, I don't mind this paragraph, as it provides an overview or the relationship between > a DRM GPU scheduler, entities, run-queues, and jobs. > >> - * >> - * The organisation of the scheduler is the following: >> - * >> - * 1. Each hw run queue has one scheduler >> - * 2. Each scheduler has multiple run queues with different priorities >> - * (e.g., HIGH_HW,HIGH_SW, KERNEL, NORMAL) >> - * 3. Each scheduler run queue has a queue of entities to schedule >> - * 4. Entities themselves maintain a queue of jobs that will be scheduled on >> - * the hardware. > This is also good, and shouldn't have been deleted. > >> - * >> - * The jobs in a entity are always scheduled in the order that they were pushed. > I'd have said here "jobs within an entity". Again shouldn't have been deleted. > This is good overall/overview information. I was hoping that I have incorporated all this into the per object description text. Did I miss anything? > >> - * >> - * Note that once a job was taken from the entities queue and pushed to the >> - * hardware, i.e. the pending queue, the entity must not be referenced anymore >> - * through the jobs entity pointer. >> + * The GPU scheduler implements some logic to decide which command submission >> + * to push next to the hardware. Another major use case for the GPU scheduler >> + * is to enforce correct driver behavior around those command submission. > The GPU scheduler also enforces correct driver behaviour around those command submissions. > >> + * Because of this it's also used by drivers which don't need the actual >> + * scheduling functionality. > ... but need to push jobs into their firmware/hardware and maintain/keep correct > DRM dependencies in the form of "fences". Good point, going to add that. > >> + * >> + * To fulfill this task the GPU scheduler uses of the following objects: >> + * >> + * 1. The job object which contains a bunch of dependencies in the form of > Drop "which". > > Instead of listing what it contains, how it is used, what it does, explain > what it is: work/task to be executed by the GPU. _Then_ you can start listing > what it contains, how it is used, what it does. > >> + * DMA-fence objects. Drivers can also implement an optional prepare_job >> + * callback which returns additional dependencies as DMA-fence objects. > "can also"? This would usually follow if the other callbacks/etc., have been described > and they haven't, so I'd say "Drivers implement an optional prepare_job callback,..." > >> + * It's important to note that this callback must follow the DMA-fence rules, >> + * so it can't easily allocate memory or grab locks under which memory is >> + * allocated. Drivers should use this as base class for an object which >> + * contains the necessary state to push the command submission to the >> + * hardware. >> + * >> + * The lifetime of the job object should at least be from pushing it into the >> + * scheduler until the scheduler notes through the free callback that a job >> + * isn't needed any more. Drivers can of course keep their job object alive >> + * longer than that, but that's outside of the scope of the scheduler >> + * component. > [New paragraph starts describing the job initialization.] > > Add: Job initialization is split into two parts, >> + * drm_sched_job_init() and drm_sched_job_arm(). > Perhaps we should mention briefly what each one does..? > > Add: It's important to note that >> + * after arming a job drivers must follow the DMA-fence rules and can't >> + * easily allocate memory or takes locks under which memory is allocated. >> + * >> + * 2. The entity object which is a container for jobs which should execute > Drop "which". "The entity object is a container of ..." > >> + * sequentially. Drivers should create an entity for each individual context >> + * they maintain for command submissions which can run in parallel. > This isn't very clear, but can be made so by: "they maintain for command submissions." > "Entities' jobs can run in parallel as facilitated by the GPU." > >> + * >> + * The lifetime of the entity should *not* exceed the lifetime of the >> + * userspace process it was created for and drivers should call the >> + * drm_sched_entity_flush() function from their file_operations.flush >> + * callback. > Okay... they should... WHEN? When we use "should do something" we always clarify with "when xyz happens." > > Add: Background is that for compatibility reasons with existing >> + * userspace all results of a command submission should become visible > "Background" --> "The reason for this is for compatibility with existing ..." > >> + * externally even after after a process exits. The only exception to that > Remove one of the two "after". > >> + * is when the process is actively killed by a SIGKILL. In this case the >> + * entity object makes sure that jobs are freed without running them while >> + * still maintaining correct sequential order for signaling fences. So it's >> + * possible that an entity object is not alive any more while jobs from it > "to not be alive" > (This paragraph here, including SIGKILL, could be made clearer, by splitting it in two > parts. One describing normal behaviour, i.e. SIGTERM, and the other describing SIGKILL.) > >> + * are still running on the hardware. >> + * >> + * 3. The hardware fence object which is a DMA-fence provided by the driver as > Drop "which". "The hardware fence object is a DMA-fence which is provided by the driver > as _a_ result of running a job/jobs." > >> + * result of running jobs. Drivers need to make sure that the normal >> + * DMA-fence semantics are followed for this object. It's important to note > "DMA-fence semantics" are mentioned several times so far, and a link to a description > of said semantics (or NULL if none is in the kernel)--would be nice to put. > >> + * that the memory for this object can *not* be allocated in the run_job >> + * callback since that would violate the requirements for the DMA-fence >> + * implementation. > Is it "no allocation" or just "allocation for this object" in run_job? > (Or maybe no kernel allocation..., we should probably clarify this.) > > Add: The scheduler maintains a timeout handler which triggers >> + * if this fence doesn't signal in a configurable time frame. >> + * >> + * The lifetime of this object follows DMA-fence ref-counting rules, the >> + * scheduler takes ownership of the reference returned by the driver and >> + * drops it when it's not needed any more. Errors should also be signaled >> + * through the hardware fence and are bubbled up back to the scheduler fence > By whom? > "through the hardware fence by the driver, and are ..." > >> + * and entity. >> + * >> + * 4. The scheduler fence object which encapsulates the whole time from pushing >> + * the job into the scheduler until the hardware has finished processing it. > Perhaps drop "time encapsulation." > > It's not very clear what you want to say here. Perhaps, use "exists" and drop "which", as in: > 4. The scheduler fence object exists/is ref-ed by DRM, etc., from the time when the job is > pushed into the scheduler until the hardware has finished with it. > > If fence signaling is involved in those two steps, it should be noted here. > > If this is about ownership, it should be simply stated. > >> + * This is internally managed by the scheduler, but drivers can grab >> + * additional reference to it after arming a job. The implementation > Instead of "implementation" perhaps use "DRM scheduler code?" > >> + * provides DMA-fence interfaces for signaling both scheduling of a command >> + * submission as well as finishing of processing. > I'd clarify with > "provides DMA-fence interfaces for drivers, for the scheduling of a command > submission, akin to the start of a command, and finishing command processing." > > Perhaps we can also mention when drivers are supposed to call these...? > >> + * >> + * The lifetime of this object also follows normal DMA-fence ref-counting >> + * rules. The finished fence is the one normally exposed outside of the >> + * scheduler, but the driver can grab references to both the scheduled as >> + * well as the finished fence when needed for pipe-lining optimizations. >> + * >> + * 5. The run queue object which is a container of entities for a certain >> + * priority level. The lifetime of those objects are bound to the scheduler >> + * lifetime. > "which is" --> "is" > "entities for a certain" --> "entities of a certain" > >> + * >> + * This is internally managed by the scheduler and drivers shouldn't touch >> + * them directly. >> + * >> + * 6. The scheduler object itself which does the actual work of selecting a job >> + * and pushing it to the hardware. Both FIFO and RR selection algorithm are >> + * supported, but FIFO is preferred for many use cases. > Let's drop "which" and just say "The scheduler object does the actual work of ..." > >> + * >> + * The lifetime of this object is managed by the driver using it. Before >> + * destroying the scheduler the driver must ensure that all hardware >> + * processing involving this scheduler object has finished by calling for >> + * example disable_irq(). It is *not* sufficient to wait for the hardware >> + * fence here since this doesn't guarantee that all callback processing has >> + * finished. > Okay, and perhaps we could mention drm_sched_fini() here. > >> + * >> + * All callbacks the driver needs to implement are restricted by DMA-fence >> + * signaling rules to guarantee deadlock free forward progress. This especially >> + * means that for normal operation no memory can be allocated. All memory which > Need a pointer to the "DMA-fence signaling rules" and also need to define "normal operation", > or clarify it in the sentence. > >> + * is needed for pushing the job to the hardware must be allocated before >> + * arming a job. It also means that no locks can be taken under which memory >> + * might be allocated as well. > Yes, that makes sense. I think this is generally the case for driver and firmware > writers, and I'd think is common sense, but it is good to have it in writing. > >> + * >> + * Memory which is optional to allocate for device core dumping or debugging >> + * *must* be allocated with GFP_NOWAIT and appropriate error handling taking if >> + * that allocation fails. GFP_ATOMIC should only be used if absolutely >> + * necessary since dipping into the special atomic reserves is usually not >> + * justified for a GPU driver. > Yes, that's well said. Good to have it here. > >> + * >> + * The scheduler also used to provided functionality for re-submitting jobs > "The scheduler also used ..." --> "The scheduler is also used ..." > >> + * with replacing the hardware fence during reset handling. This functionality >> + * is now marked as deprecated since this has proven to be fundamentally racy >> + * and not compatible with DMA-fence rules and shouldn't be used in any new >> + * code. > I wouldn't use a contraction here. In order to emphasize that job re-submission > is depreciated, I'd use: > "and should not be used in any new code." I only skimmed over the rest of the comment. Danilo already suggested some similar notes, especially on the style which resulted in a v2 of the documentation. Going over this here later today and will probably send out a v3 on Monday. Thanks, Christian. > >> */ >> >> #include <linux/kthread.h>
diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c index 506371c42745..36a7c5dc852d 100644 --- a/drivers/gpu/drm/scheduler/sched_main.c +++ b/drivers/gpu/drm/scheduler/sched_main.c @@ -24,28 +24,110 @@ /** * DOC: Overview * - * The GPU scheduler provides entities which allow userspace to push jobs - * into software queues which are then scheduled on a hardware run queue. - * The software queues have a priority among them. The scheduler selects the entities - * from the run queue using a FIFO. The scheduler provides dependency handling - * features among jobs. The driver is supposed to provide callback functions for - * backend operations to the scheduler like submitting a job to hardware run queue, - * returning the dependencies of a job etc. - * - * The organisation of the scheduler is the following: - * - * 1. Each hw run queue has one scheduler - * 2. Each scheduler has multiple run queues with different priorities - * (e.g., HIGH_HW,HIGH_SW, KERNEL, NORMAL) - * 3. Each scheduler run queue has a queue of entities to schedule - * 4. Entities themselves maintain a queue of jobs that will be scheduled on - * the hardware. - * - * The jobs in a entity are always scheduled in the order that they were pushed. - * - * Note that once a job was taken from the entities queue and pushed to the - * hardware, i.e. the pending queue, the entity must not be referenced anymore - * through the jobs entity pointer. + * The GPU scheduler implements some logic to decide which command submission + * to push next to the hardware. Another major use case for the GPU scheduler + * is to enforce correct driver behavior around those command submission. + * Because of this it's also used by drivers which don't need the actual + * scheduling functionality. + * + * To fulfill this task the GPU scheduler uses of the following objects: + * + * 1. The job object which contains a bunch of dependencies in the form of + * DMA-fence objects. Drivers can also implement an optional prepare_job + * callback which returns additional dependencies as DMA-fence objects. + * It's important to note that this callback must follow the DMA-fence rules, + * so it can't easily allocate memory or grab locks under which memory is + * allocated. Drivers should use this as base class for an object which + * contains the necessary state to push the command submission to the + * hardware. + * + * The lifetime of the job object should at least be from pushing it into the + * scheduler until the scheduler notes through the free callback that a job + * isn't needed any more. Drivers can of course keep their job object alive + * longer than that, but that's outside of the scope of the scheduler + * component. Job initialization is split into two parts, + * drm_sched_job_init() and drm_sched_job_arm(). It's important to note that + * after arming a job drivers must follow the DMA-fence rules and can't + * easily allocate memory or takes locks under which memory is allocated. + * + * 2. The entity object which is a container for jobs which should execute + * sequentially. Drivers should create an entity for each individual context + * they maintain for command submissions which can run in parallel. + * + * The lifetime of the entity should *not* exceed the lifetime of the + * userspace process it was created for and drivers should call the + * drm_sched_entity_flush() function from their file_operations.flush + * callback. Background is that for compatibility reasons with existing + * userspace all results of a command submission should become visible + * externally even after after a process exits. The only exception to that + * is when the process is actively killed by a SIGKILL. In this case the + * entity object makes sure that jobs are freed without running them while + * still maintaining correct sequential order for signaling fences. So it's + * possible that an entity object is not alive any more while jobs from it + * are still running on the hardware. + * + * 3. The hardware fence object which is a DMA-fence provided by the driver as + * result of running jobs. Drivers need to make sure that the normal + * DMA-fence semantics are followed for this object. It's important to note + * that the memory for this object can *not* be allocated in the run_job + * callback since that would violate the requirements for the DMA-fence + * implementation. The scheduler maintains a timeout handler which triggers + * if this fence doesn't signal in a configurable time frame. + * + * The lifetime of this object follows DMA-fence ref-counting rules, the + * scheduler takes ownership of the reference returned by the driver and + * drops it when it's not needed any more. Errors should also be signaled + * through the hardware fence and are bubbled up back to the scheduler fence + * and entity. + * + * 4. The scheduler fence object which encapsulates the whole time from pushing + * the job into the scheduler until the hardware has finished processing it. + * This is internally managed by the scheduler, but drivers can grab + * additional reference to it after arming a job. The implementation + * provides DMA-fence interfaces for signaling both scheduling of a command + * submission as well as finishing of processing. + * + * The lifetime of this object also follows normal DMA-fence ref-counting + * rules. The finished fence is the one normally exposed outside of the + * scheduler, but the driver can grab references to both the scheduled as + * well as the finished fence when needed for pipe-lining optimizations. + * + * 5. The run queue object which is a container of entities for a certain + * priority level. The lifetime of those objects are bound to the scheduler + * lifetime. + * + * This is internally managed by the scheduler and drivers shouldn't touch + * them directly. + * + * 6. The scheduler object itself which does the actual work of selecting a job + * and pushing it to the hardware. Both FIFO and RR selection algorithm are + * supported, but FIFO is preferred for many use cases. + * + * The lifetime of this object is managed by the driver using it. Before + * destroying the scheduler the driver must ensure that all hardware + * processing involving this scheduler object has finished by calling for + * example disable_irq(). It is *not* sufficient to wait for the hardware + * fence here since this doesn't guarantee that all callback processing has + * finished. + * + * All callbacks the driver needs to implement are restricted by DMA-fence + * signaling rules to guarantee deadlock free forward progress. This especially + * means that for normal operation no memory can be allocated. All memory which + * is needed for pushing the job to the hardware must be allocated before + * arming a job. It also means that no locks can be taken under which memory + * might be allocated as well. + * + * Memory which is optional to allocate for device core dumping or debugging + * *must* be allocated with GFP_NOWAIT and appropriate error handling taking if + * that allocation fails. GFP_ATOMIC should only be used if absolutely + * necessary since dipping into the special atomic reserves is usually not + * justified for a GPU driver. + * + * The scheduler also used to provided functionality for re-submitting jobs + * with replacing the hardware fence during reset handling. This functionality + * is now marked as deprecated since this has proven to be fundamentally racy + * and not compatible with DMA-fence rules and shouldn't be used in any new + * code. */ #include <linux/kthread.h>
Start to improve the scheduler document. Especially document the lifetime of each of the objects as well as the restrictions around DMA-fence handling and userspace compatibility. Signed-off-by: Christian König <christian.koenig@amd.com> --- drivers/gpu/drm/scheduler/sched_main.c | 126 ++++++++++++++++++++----- 1 file changed, 104 insertions(+), 22 deletions(-)