Message ID | 1594733130-398-1-git-send-email-akhilpo@codeaurora.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | drm: msm: a6xx: fix gpu failure after system resume | expand |
On Tue, Jul 14, 2020 at 06:55:30PM +0530, Akhil P Oommen wrote: > On targets where GMU is available, GMU takes over the ownership of GX GDSC > during its initialization. So, take a refcount on the GX PD on behalf of > GMU before we initialize it. This makes sure that nobody can collapse the > GX GDSC once GMU owns the GX GDSC. This patch fixes some weird failures > during GPU wake up during system resume. The change looks fine but this explanation is confusing. When I read it I thought "oh, man, we weren't taking a reference to the GX PD during resume???" but that's not really the case. We *are* taking a reference, just not soon enough to avoid possible issues. It would be helpful if you reworded this to explain that you are moving the reference and perhaps to shine a bit more light on what the "weird" failures are. Jordan > Signed-off-by: Akhil P Oommen <akhilpo@codeaurora.org> > --- > drivers/gpu/drm/msm/adreno/a6xx_gmu.c | 18 ++++++++++-------- > 1 file changed, 10 insertions(+), 8 deletions(-) > > diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c > index a6f43ff..5b2df7d 100644 > --- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c > +++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c > @@ -873,10 +873,19 @@ int a6xx_gmu_resume(struct a6xx_gpu *a6xx_gpu) > /* Turn on the resources */ > pm_runtime_get_sync(gmu->dev); > > + /* > + * "enable" the GX power domain which won't actually do anything but it > + * will make sure that the refcounting is correct in case we need to > + * bring down the GX after a GMU failure > + */ > + if (!IS_ERR_OR_NULL(gmu->gxpd)) > + pm_runtime_get_sync(gmu->gxpd); > + > /* Use a known rate to bring up the GMU */ > clk_set_rate(gmu->core_clk, 200000000); > ret = clk_bulk_prepare_enable(gmu->nr_clocks, gmu->clocks); > if (ret) { > + pm_runtime_put(gmu->gxpd); > pm_runtime_put(gmu->dev); > return ret; > } > @@ -919,19 +928,12 @@ int a6xx_gmu_resume(struct a6xx_gpu *a6xx_gpu) > /* Set the GPU to the current freq */ > a6xx_gmu_set_initial_freq(gpu, gmu); > > - /* > - * "enable" the GX power domain which won't actually do anything but it > - * will make sure that the refcounting is correct in case we need to > - * bring down the GX after a GMU failure > - */ > - if (!IS_ERR_OR_NULL(gmu->gxpd)) > - pm_runtime_get(gmu->gxpd); > - > out: > /* On failure, shut down the GMU to leave it in a good state */ > if (ret) { > disable_irq(gmu->gmu_irq); > a6xx_rpmh_stop(gmu); > + pm_runtime_put(gmu->gxpd); > pm_runtime_put(gmu->dev); > } > > -- > 2.7.4 >
On Tue, Jul 14, 2020 at 06:55:30PM +0530, Akhil P Oommen wrote: > On targets where GMU is available, GMU takes over the ownership of GX GDSC > during its initialization. So, take a refcount on the GX PD on behalf of > GMU before we initialize it. This makes sure that nobody can collapse the > GX GDSC once GMU owns the GX GDSC. This patch fixes some weird failures > during GPU wake up during system resume. > > Signed-off-by: Akhil P Oommen <akhilpo@codeaurora.org> I went through a few dozen suspend/resume cycles on SC7180 and didn't run into the kernel panic that typically occurs after a few iterations without this patch. Reported-by: Matthias Kaehlcke <mka@chromium.org> Tested-by: Matthias Kaehlcke <mka@chromium.org> On which tree is this patch based on? I had to apply it manually because 'git am' is unhappy when I try to apply it: error: sha1 information is lacking or useless (drivers/gpu/drm/msm/adreno/a6xx_gmu.c). error: could not build fake ancestor Both upstream and drm-msm are in my remotes and synced, so I suspect it's some private tree. Please make sure to base patches on the corresponding maintainer tree or upstream, whichs makes life easier for maintainers, testers and reviewers.
On Tue, Jul 14, 2020 at 10:10 AM Matthias Kaehlcke <mka@chromium.org> wrote: > > On Tue, Jul 14, 2020 at 06:55:30PM +0530, Akhil P Oommen wrote: > > On targets where GMU is available, GMU takes over the ownership of GX GDSC > > during its initialization. So, take a refcount on the GX PD on behalf of > > GMU before we initialize it. This makes sure that nobody can collapse the > > GX GDSC once GMU owns the GX GDSC. This patch fixes some weird failures > > during GPU wake up during system resume. > > > > Signed-off-by: Akhil P Oommen <akhilpo@codeaurora.org> > > I went through a few dozen suspend/resume cycles on SC7180 and didn't run > into the kernel panic that typically occurs after a few iterations without > this patch. > > Reported-by: Matthias Kaehlcke <mka@chromium.org> > Tested-by: Matthias Kaehlcke <mka@chromium.org> > > On which tree is this patch based on? I had to apply it manually because > 'git am' is unhappy when I try to apply it: > > error: sha1 information is lacking or useless (drivers/gpu/drm/msm/adreno/a6xx_gmu.c). > error: could not build fake ancestor > > Both upstream and drm-msm are in my remotes and synced, so I suspect it's > some private tree. Please make sure to base patches on the corresponding > maintainer tree or upstream, whichs makes life easier for maintainers, > testers and reviewers. I've run into the same issue frequently :-( BR, -R
On 7/15/2020 12:12 AM, Rob Clark wrote: > On Tue, Jul 14, 2020 at 10:10 AM Matthias Kaehlcke <mka@chromium.org> wrote: >> >> On Tue, Jul 14, 2020 at 06:55:30PM +0530, Akhil P Oommen wrote: >>> On targets where GMU is available, GMU takes over the ownership of GX GDSC >>> during its initialization. So, take a refcount on the GX PD on behalf of >>> GMU before we initialize it. This makes sure that nobody can collapse the >>> GX GDSC once GMU owns the GX GDSC. This patch fixes some weird failures >>> during GPU wake up during system resume. >>> >>> Signed-off-by: Akhil P Oommen <akhilpo@codeaurora.org> >> >> I went through a few dozen suspend/resume cycles on SC7180 and didn't run >> into the kernel panic that typically occurs after a few iterations without >> this patch. >> >> Reported-by: Matthias Kaehlcke <mka@chromium.org> >> Tested-by: Matthias Kaehlcke <mka@chromium.org> >> >> On which tree is this patch based on? I had to apply it manually because >> 'git am' is unhappy when I try to apply it: >> >> error: sha1 information is lacking or useless (drivers/gpu/drm/msm/adreno/a6xx_gmu.c). >> error: could not build fake ancestor >> >> Both upstream and drm-msm are in my remotes and synced, so I suspect it's >> some private tree. Please make sure to base patches on the corresponding >> maintainer tree or upstream, whichs makes life easier for maintainers, >> testers and reviewers. > > I've run into the same issue frequently :-( > > BR, > -R > Sorry, I was using msm-next brand as the base, but had the opp-next branch merged too inadvertently. -Akhil
diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c index a6f43ff..5b2df7d 100644 --- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c +++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c @@ -873,10 +873,19 @@ int a6xx_gmu_resume(struct a6xx_gpu *a6xx_gpu) /* Turn on the resources */ pm_runtime_get_sync(gmu->dev); + /* + * "enable" the GX power domain which won't actually do anything but it + * will make sure that the refcounting is correct in case we need to + * bring down the GX after a GMU failure + */ + if (!IS_ERR_OR_NULL(gmu->gxpd)) + pm_runtime_get_sync(gmu->gxpd); + /* Use a known rate to bring up the GMU */ clk_set_rate(gmu->core_clk, 200000000); ret = clk_bulk_prepare_enable(gmu->nr_clocks, gmu->clocks); if (ret) { + pm_runtime_put(gmu->gxpd); pm_runtime_put(gmu->dev); return ret; } @@ -919,19 +928,12 @@ int a6xx_gmu_resume(struct a6xx_gpu *a6xx_gpu) /* Set the GPU to the current freq */ a6xx_gmu_set_initial_freq(gpu, gmu); - /* - * "enable" the GX power domain which won't actually do anything but it - * will make sure that the refcounting is correct in case we need to - * bring down the GX after a GMU failure - */ - if (!IS_ERR_OR_NULL(gmu->gxpd)) - pm_runtime_get(gmu->gxpd); - out: /* On failure, shut down the GMU to leave it in a good state */ if (ret) { disable_irq(gmu->gmu_irq); a6xx_rpmh_stop(gmu); + pm_runtime_put(gmu->gxpd); pm_runtime_put(gmu->dev); }
On targets where GMU is available, GMU takes over the ownership of GX GDSC during its initialization. So, take a refcount on the GX PD on behalf of GMU before we initialize it. This makes sure that nobody can collapse the GX GDSC once GMU owns the GX GDSC. This patch fixes some weird failures during GPU wake up during system resume. Signed-off-by: Akhil P Oommen <akhilpo@codeaurora.org> --- drivers/gpu/drm/msm/adreno/a6xx_gmu.c | 18 ++++++++++-------- 1 file changed, 10 insertions(+), 8 deletions(-)