Message ID | 20210127233946.1286386-2-eric@anholt.net (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [1/3] drm/msm: Fix race of GPU init vs timestamp power management. | expand |
On Wed, Jan 27, 2021 at 03:39:45PM -0800, Eric Anholt wrote: > Now that we're not racing with GPU setup, also fix races of timestamps > against other timestamps. In CI, we were seeing this path trigger > timeouts on setting the GMU bit, especially on the first set of tests > right after boot (it's probably easier to lose the race than one might > think, given that we start many tests in parallel, and waiting for NFS > to page in code probably means that lots of tests hit the same point > of screen init at the same time). > > Signed-off-by: Eric Anholt <eric@anholt.net> > Cc: stable@vger.kernel.org # v5.9 The joys of not having a global mutex locking everything. Reviewed-by: Jordan Crouse <jcrouse@codeaurora.org> > --- > drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 4 ++++ > 1 file changed, 4 insertions(+) > > diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c > index 7424a70b9d35..e8f0b5325a7f 100644 > --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c > +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c > @@ -1175,6 +1175,9 @@ static int a6xx_get_timestamp(struct msm_gpu *gpu, uint64_t *value) > { > struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu); > struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu); > + static DEFINE_MUTEX(perfcounter_oob); > + > + mutex_lock(&perfcounter_oob); > > /* Force the GPU power on so we can read this register */ > a6xx_gmu_set_oob(&a6xx_gpu->gmu, GMU_OOB_PERFCOUNTER_SET); > @@ -1183,6 +1186,7 @@ static int a6xx_get_timestamp(struct msm_gpu *gpu, uint64_t *value) > REG_A6XX_RBBM_PERFCTR_CP_0_HI); > > a6xx_gmu_clear_oob(&a6xx_gpu->gmu, GMU_OOB_PERFCOUNTER_SET); > + mutex_unlock(&perfcounter_oob); > return 0; > } > > -- > 2.30.0 >
On Wed, Jan 27, 2021 at 3:39 PM Eric Anholt <eric@anholt.net> wrote: > > Now that we're not racing with GPU setup, also fix races of timestamps > against other timestamps. In CI, we were seeing this path trigger > timeouts on setting the GMU bit, especially on the first set of tests > right after boot (it's probably easier to lose the race than one might > think, given that we start many tests in parallel, and waiting for NFS > to page in code probably means that lots of tests hit the same point > of screen init at the same time). Could you add the error msg to the commit msg, to make it more easily searchable? BR, -R > Signed-off-by: Eric Anholt <eric@anholt.net> > Cc: stable@vger.kernel.org # v5.9 > --- > drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 4 ++++ > 1 file changed, 4 insertions(+) > > diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c > index 7424a70b9d35..e8f0b5325a7f 100644 > --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c > +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c > @@ -1175,6 +1175,9 @@ static int a6xx_get_timestamp(struct msm_gpu *gpu, uint64_t *value) > { > struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu); > struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu); > + static DEFINE_MUTEX(perfcounter_oob); > + > + mutex_lock(&perfcounter_oob); > > /* Force the GPU power on so we can read this register */ > a6xx_gmu_set_oob(&a6xx_gpu->gmu, GMU_OOB_PERFCOUNTER_SET); > @@ -1183,6 +1186,7 @@ static int a6xx_get_timestamp(struct msm_gpu *gpu, uint64_t *value) > REG_A6XX_RBBM_PERFCTR_CP_0_HI); > > a6xx_gmu_clear_oob(&a6xx_gpu->gmu, GMU_OOB_PERFCOUNTER_SET); > + mutex_unlock(&perfcounter_oob); > return 0; > } > > -- > 2.30.0 >
diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c index 7424a70b9d35..e8f0b5325a7f 100644 --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c @@ -1175,6 +1175,9 @@ static int a6xx_get_timestamp(struct msm_gpu *gpu, uint64_t *value) { struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu); struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu); + static DEFINE_MUTEX(perfcounter_oob); + + mutex_lock(&perfcounter_oob); /* Force the GPU power on so we can read this register */ a6xx_gmu_set_oob(&a6xx_gpu->gmu, GMU_OOB_PERFCOUNTER_SET); @@ -1183,6 +1186,7 @@ static int a6xx_get_timestamp(struct msm_gpu *gpu, uint64_t *value) REG_A6XX_RBBM_PERFCTR_CP_0_HI); a6xx_gmu_clear_oob(&a6xx_gpu->gmu, GMU_OOB_PERFCOUNTER_SET); + mutex_unlock(&perfcounter_oob); return 0; }
Now that we're not racing with GPU setup, also fix races of timestamps against other timestamps. In CI, we were seeing this path trigger timeouts on setting the GMU bit, especially on the first set of tests right after boot (it's probably easier to lose the race than one might think, given that we start many tests in parallel, and waiting for NFS to page in code probably means that lots of tests hit the same point of screen init at the same time). Signed-off-by: Eric Anholt <eric@anholt.net> Cc: stable@vger.kernel.org # v5.9 --- drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 4 ++++ 1 file changed, 4 insertions(+)