Message ID | 20211222001728.2514705-1-l.stach@pengutronix.de (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | drm/etnaviv: consider completed fence seqno in hang check | expand |
Am Mi., 22. Dez. 2021 um 01:17 Uhr schrieb Lucas Stach <l.stach@pengutronix.de>: > > Some GPU heavy test programs manage to trigger the hangcheck quite often. > If there are no other GPU users in the system and the test program > exhibits a very regular structure in the commandstreams that are being > submitted, we can end up with two distinct submits managing to trigger > the hangcheck with the FE in a very similar address range. This leads > the hangcheck to believe that the GPU is stuck, while in reality the GPU > is already busy working on a different job. To avoid those spurious > GPU resets, also remember and consider the last completed fence seqno > in the hang check. > > Reported-by: Joerg Albert <joerg.albert@iav.de> > Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gpu.h b/drivers/gpu/drm/etnaviv/etnaviv_gpu.h index 1c75c8ed5bce..85eddd492774 100644 --- a/drivers/gpu/drm/etnaviv/etnaviv_gpu.h +++ b/drivers/gpu/drm/etnaviv/etnaviv_gpu.h @@ -130,6 +130,7 @@ struct etnaviv_gpu { /* hang detection */ u32 hangcheck_dma_addr; + u32 hangcheck_fence; void __iomem *mmio; int irq; diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c b/drivers/gpu/drm/etnaviv/etnaviv_sched.c index 180bb633d5c5..58f593b278c1 100644 --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c @@ -107,8 +107,10 @@ static enum drm_gpu_sched_stat etnaviv_sched_timedout_job(struct drm_sched_job */ dma_addr = gpu_read(gpu, VIVS_FE_DMA_ADDRESS); change = dma_addr - gpu->hangcheck_dma_addr; - if (change < 0 || change > 16) { + if (gpu->completed_fence != gpu->hangcheck_fence || + change < 0 || change > 16) { gpu->hangcheck_dma_addr = dma_addr; + gpu->hangcheck_fence = gpu->completed_fence; goto out_no_timeout; }
Some GPU heavy test programs manage to trigger the hangcheck quite often. If there are no other GPU users in the system and the test program exhibits a very regular structure in the commandstreams that are being submitted, we can end up with two distinct submits managing to trigger the hangcheck with the FE in a very similar address range. This leads the hangcheck to believe that the GPU is stuck, while in reality the GPU is already busy working on a different job. To avoid those spurious GPU resets, also remember and consider the last completed fence seqno in the hang check. Reported-by: Joerg Albert <joerg.albert@iav.de> Signed-off-by: Lucas Stach <l.stach@pengutronix.de> --- drivers/gpu/drm/etnaviv/etnaviv_gpu.h | 1 + drivers/gpu/drm/etnaviv/etnaviv_sched.c | 4 +++- 2 files changed, 4 insertions(+), 1 deletion(-)