Message ID | 20230116072234.3970768-1-wenst@chromium.org (mailing list archive) |
---|---|
State | Superseded, archived |
Headers | show |
Series | drm/bridge: anx7625: Drop device lock before drm_helper_hpd_irq_event() | expand |
Hi, On Mon, Jan 16, 2023 at 03:22:34PM +0800, Chen-Yu Tsai wrote: > The device lock is used to serialize the low level power sequencing > operations. Since drm_helper_hpd_irq_event() could end up calling > .atomic_enable, which also calls power sequencing functions through > runtime PM, this results in a real deadlock. This was observed on an > MT8192-based Chromebook's external display (with appropriate patches [1] > and DT changes applied). > > Move the drm_helper_hpd_irq_event() call outside of the lock range. The > lock only needs to be held so that the device status can be read back. > This is the bare minimum change to avoid the deadlock. The lock could > be dropped completely and have pm_runtime_get_if_in_use() increase the > reference count, but this is not the same as pm_runtime_suspended(). > This also causes the internal display of the same device to not > function correctly. Both the internal and external display of said > device each use one anx7625 bridge. > > [1] https://lore.kernel.org/dri-devel/20230112042104.4107253-1-treapking@chromium.org/ > > Fixes: 60487584a79a ("drm/bridge: anx7625: refactor power control to use runtime PM framework") > Signed-off-by: Chen-Yu Tsai <wenst@chromium.org> > --- > FWIW I'm aware that this driver could be refactored a lot better. > The work function might be simplified and merged into the threaded > interrupt handler. The .detect op should be reading the HPD state > from the hardware, not some cached state. > > drivers/gpu/drm/bridge/analogix/anx7625.c | 13 ++++++------- > 1 file changed, 6 insertions(+), 7 deletions(-) > > diff --git a/drivers/gpu/drm/bridge/analogix/anx7625.c b/drivers/gpu/drm/bridge/analogix/anx7625.c > index 7e1fb93a6ce4..bf1770b79bba 100644 > --- a/drivers/gpu/drm/bridge/analogix/anx7625.c > +++ b/drivers/gpu/drm/bridge/analogix/anx7625.c > @@ -1597,18 +1597,17 @@ static void anx7625_work_func(struct work_struct *work) > > mutex_lock(&ctx->lock); > > - if (pm_runtime_suspended(&ctx->client->dev)) > - goto unlock; > + if (pm_runtime_suspended(&ctx->client->dev)) { > + mutex_unlock(&ctx->lock); > + return; > + } > > event = anx7625_hpd_change_detect(ctx); > - if (event < 0) Are you intentionally dropping this early-return on error? > - goto unlock; > + > + mutex_unlock(&ctx->lock); > > if (ctx->bridge_attached) > drm_helper_hpd_irq_event(ctx->bridge.dev); > - > -unlock: > - mutex_unlock(&ctx->lock); > } > > static irqreturn_t anx7625_intr_hpd_isr(int irq, void *data) > -- > 2.39.0.314.g84b9a713c41-goog >
diff --git a/drivers/gpu/drm/bridge/analogix/anx7625.c b/drivers/gpu/drm/bridge/analogix/anx7625.c index 7e1fb93a6ce4..bf1770b79bba 100644 --- a/drivers/gpu/drm/bridge/analogix/anx7625.c +++ b/drivers/gpu/drm/bridge/analogix/anx7625.c @@ -1597,18 +1597,17 @@ static void anx7625_work_func(struct work_struct *work) mutex_lock(&ctx->lock); - if (pm_runtime_suspended(&ctx->client->dev)) - goto unlock; + if (pm_runtime_suspended(&ctx->client->dev)) { + mutex_unlock(&ctx->lock); + return; + } event = anx7625_hpd_change_detect(ctx); - if (event < 0) - goto unlock; + + mutex_unlock(&ctx->lock); if (ctx->bridge_attached) drm_helper_hpd_irq_event(ctx->bridge.dev); - -unlock: - mutex_unlock(&ctx->lock); } static irqreturn_t anx7625_intr_hpd_isr(int irq, void *data)
The device lock is used to serialize the low level power sequencing operations. Since drm_helper_hpd_irq_event() could end up calling .atomic_enable, which also calls power sequencing functions through runtime PM, this results in a real deadlock. This was observed on an MT8192-based Chromebook's external display (with appropriate patches [1] and DT changes applied). Move the drm_helper_hpd_irq_event() call outside of the lock range. The lock only needs to be held so that the device status can be read back. This is the bare minimum change to avoid the deadlock. The lock could be dropped completely and have pm_runtime_get_if_in_use() increase the reference count, but this is not the same as pm_runtime_suspended(). This also causes the internal display of the same device to not function correctly. Both the internal and external display of said device each use one anx7625 bridge. [1] https://lore.kernel.org/dri-devel/20230112042104.4107253-1-treapking@chromium.org/ Fixes: 60487584a79a ("drm/bridge: anx7625: refactor power control to use runtime PM framework") Signed-off-by: Chen-Yu Tsai <wenst@chromium.org> --- FWIW I'm aware that this driver could be refactored a lot better. The work function might be simplified and merged into the threaded interrupt handler. The .detect op should be reading the HPD state from the hardware, not some cached state. drivers/gpu/drm/bridge/analogix/anx7625.c | 13 ++++++------- 1 file changed, 6 insertions(+), 7 deletions(-)