Message ID | 20240425071837.529039-2-boris.brezillon@collabora.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | drm/panthor: Collection of tiler heap related fixes | expand |
On 25/04/2024 08:18, Boris Brezillon wrote: > From: Antonino Maniscalco <antonino.maniscalco@collabora.com> > > If the kernel couldn't allocate memory because we reached the maximum > number of chunks but no render passes are in flight > (panthor_heap_grow() returning -ENOMEM), we should defer the OOM > handling to the FW by returning a NULL chunk. The FW will then call > the tiler OOM exception handler, which is supposed to implement > incremental rendering (execute an intermediate fragment job to flush > the pending primitives, release the tiler memory that was used to > store those primitives, and start over from where it stopped). > > Fixes: de8548813824 ("drm/panthor: Add the scheduler logical block") > Signed-off-by: Antonino Maniscalco <antonino.maniscalco@collabora.com> > Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com> Reviewed-by: Steven Price <steven.price@arm.com> Although I think the real issue here is that we haven't clearly defined the return values from panthor_heap_grow - it's a bit weird to have two different error codes for the same "try again later after incremental rendering" result. But as a fix this seems most clear. Steve > --- > drivers/gpu/drm/panthor/panthor_sched.c | 8 +++++++- > 1 file changed, 7 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/panthor/panthor_sched.c b/drivers/gpu/drm/panthor/panthor_sched.c > index b3a51a6de523..6de8c0c702cb 100644 > --- a/drivers/gpu/drm/panthor/panthor_sched.c > +++ b/drivers/gpu/drm/panthor/panthor_sched.c > @@ -1354,7 +1354,13 @@ static int group_process_tiler_oom(struct panthor_group *group, u32 cs_id) > pending_frag_count, &new_chunk_va); > } > > - if (ret && ret != -EBUSY) { > + /* If the kernel couldn't allocate memory because we reached the maximum > + * number of chunks (EBUSY if we have render passes in flight, ENOMEM > + * otherwise), we want to let the FW try to reclaim memory by waiting > + * for fragment jobs to land or by executing the tiler OOM exception > + * handler, which is supposed to implement incremental rendering. > + */ > + if (ret && ret != -EBUSY && ret != -ENOMEM) { > drm_warn(&ptdev->base, "Failed to extend the tiler heap\n"); > group->fatal_queues |= BIT(cs_id); > sched_queue_delayed_work(sched, tick, 0);
On Thu, 25 Apr 2024 10:28:49 +0100 Steven Price <steven.price@arm.com> wrote: > On 25/04/2024 08:18, Boris Brezillon wrote: > > From: Antonino Maniscalco <antonino.maniscalco@collabora.com> > > > > If the kernel couldn't allocate memory because we reached the maximum > > number of chunks but no render passes are in flight > > (panthor_heap_grow() returning -ENOMEM), we should defer the OOM > > handling to the FW by returning a NULL chunk. The FW will then call > > the tiler OOM exception handler, which is supposed to implement > > incremental rendering (execute an intermediate fragment job to flush > > the pending primitives, release the tiler memory that was used to > > store those primitives, and start over from where it stopped). > > > > Fixes: de8548813824 ("drm/panthor: Add the scheduler logical block") > > Signed-off-by: Antonino Maniscalco <antonino.maniscalco@collabora.com> > > Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com> > > Reviewed-by: Steven Price <steven.price@arm.com> > > Although I think the real issue here is that we haven't clearly defined > the return values from panthor_heap_grow - it's a bit weird to have two > different error codes for the same "try again later after incremental > rendering" result. But as a fix this seems most clear. Yeah, I actually considered returning -EBUSY for the 'max_chunks reached' situation, but then realized we would also want to trigger incremental rendering for actual mem allocation failures (when chunk_count < max_chunks) once the fail-able/non-blocking allocation logic is implemented, and for this kind of failure it makes more sense to return -ENOMEM, even though this implies checking against two values instead of one. I guess returning -ENOMEM instead of -EBUSY for the case where we have render passes in-flight wouldn't be too awkward, as this can be seen as the kernel refusing to allocate more memory. > > Steve > > > --- > > drivers/gpu/drm/panthor/panthor_sched.c | 8 +++++++- > > 1 file changed, 7 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/gpu/drm/panthor/panthor_sched.c b/drivers/gpu/drm/panthor/panthor_sched.c > > index b3a51a6de523..6de8c0c702cb 100644 > > --- a/drivers/gpu/drm/panthor/panthor_sched.c > > +++ b/drivers/gpu/drm/panthor/panthor_sched.c > > @@ -1354,7 +1354,13 @@ static int group_process_tiler_oom(struct panthor_group *group, u32 cs_id) > > pending_frag_count, &new_chunk_va); > > } > > > > - if (ret && ret != -EBUSY) { > > + /* If the kernel couldn't allocate memory because we reached the maximum > > + * number of chunks (EBUSY if we have render passes in flight, ENOMEM > > + * otherwise), we want to let the FW try to reclaim memory by waiting > > + * for fragment jobs to land or by executing the tiler OOM exception > > + * handler, which is supposed to implement incremental rendering. > > + */ > > + if (ret && ret != -EBUSY && ret != -ENOMEM) { > > drm_warn(&ptdev->base, "Failed to extend the tiler heap\n"); > > group->fatal_queues |= BIT(cs_id); > > sched_queue_delayed_work(sched, tick, 0); >
diff --git a/drivers/gpu/drm/panthor/panthor_sched.c b/drivers/gpu/drm/panthor/panthor_sched.c index b3a51a6de523..6de8c0c702cb 100644 --- a/drivers/gpu/drm/panthor/panthor_sched.c +++ b/drivers/gpu/drm/panthor/panthor_sched.c @@ -1354,7 +1354,13 @@ static int group_process_tiler_oom(struct panthor_group *group, u32 cs_id) pending_frag_count, &new_chunk_va); } - if (ret && ret != -EBUSY) { + /* If the kernel couldn't allocate memory because we reached the maximum + * number of chunks (EBUSY if we have render passes in flight, ENOMEM + * otherwise), we want to let the FW try to reclaim memory by waiting + * for fragment jobs to land or by executing the tiler OOM exception + * handler, which is supposed to implement incremental rendering. + */ + if (ret && ret != -EBUSY && ret != -ENOMEM) { drm_warn(&ptdev->base, "Failed to extend the tiler heap\n"); group->fatal_queues |= BIT(cs_id); sched_queue_delayed_work(sched, tick, 0);