Message ID | 1652313768-16286-1-git-send-email-quic_khsieh@quicinc.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | [v5] drm/msm/dp: Always clear mask bits to disable interrupts at dp_ctrl_reset_irq_ctrl() | expand |
On 12/05/2022 03:02, Kuogee Hsieh wrote: > dp_catalog_ctrl_reset() will software reset DP controller. But it will > not reset programmable registers to default value. DP driver still have > to clear mask bits to interrupt status registers to disable interrupts > after software reset of controller. > > At current implementation, dp_ctrl_reset_irq_ctrl() will software reset dp > controller but did not call dp_catalog_ctrl_enable_irq(false) to clear hpd > related interrupt mask bits to disable hpd related interrupts due to it > mistakenly think hpd related interrupt mask bits will be cleared by software > reset of dp controller automatically. This mistake may cause system to crash > during suspending procedure due to unexpected irq fired and trigger event > thread to access dp controller registers with controller clocks are disabled. > > This patch fixes system crash during suspending problem by removing "enable" > flag condition checking at dp_ctrl_reset_irq_ctrl() so that hpd related > interrupt mask bits are cleared to prevent unexpected from happening. ok. This is the main part of the description. > In addition, this patch also add suspended flag to prevent new events be > added into event Q to wake up event thread after system suspended. And this describes the major part of the patch. First, it should go to a separate patch. Second, can you please describe, how can this happen once you disable the HPD interrupt? > > Changes in v2: > -- add more details commit text > > Changes in v3: > -- add synchrons_irq() > -- add atomic_t suspended > > Changes in v4: > -- correct Fixes's commit ID > -- remove synchrons_irq() > > Changes in v5: > -- revise commit text > > Fixes: 989ebe7bc446 ("drm/msm/dp: do not initialize phy until plugin interrupt received") > Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com> > --- > drivers/gpu/drm/msm/dp/dp_ctrl.c | 9 +++++++-- > drivers/gpu/drm/msm/dp/dp_display.c | 11 +++++++++++ > 2 files changed, 18 insertions(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/msm/dp/dp_ctrl.c b/drivers/gpu/drm/msm/dp/dp_ctrl.c > index af7a80c..f3e333e 100644 > --- a/drivers/gpu/drm/msm/dp/dp_ctrl.c > +++ b/drivers/gpu/drm/msm/dp/dp_ctrl.c > @@ -1389,8 +1389,13 @@ void dp_ctrl_reset_irq_ctrl(struct dp_ctrl *dp_ctrl, bool enable) > > dp_catalog_ctrl_reset(ctrl->catalog); > > - if (enable) > - dp_catalog_ctrl_enable_irq(ctrl->catalog, enable); > + /* > + * all dp controller programmable registers will not > + * be reset to default value after DP_SW_RESET > + * therefore interrupt mask bits have to be updated > + * to enable/disable interrupts > + */ > + dp_catalog_ctrl_enable_irq(ctrl->catalog, enable); > } > > void dp_ctrl_phy_init(struct dp_ctrl *dp_ctrl) > diff --git a/drivers/gpu/drm/msm/dp/dp_display.c b/drivers/gpu/drm/msm/dp/dp_display.c > index c388323..79439b8 100644 > --- a/drivers/gpu/drm/msm/dp/dp_display.c > +++ b/drivers/gpu/drm/msm/dp/dp_display.c > @@ -98,6 +98,8 @@ struct dp_display_private { > struct dp_ctrl *ctrl; > struct dp_debug *debug; > > + atomic_t suspended; I think it'd be better to protect it with event_lock rather than using atomics. > + > struct dp_usbpd_cb usbpd_cb; > struct dp_display_mode dp_mode; > struct msm_dp dp_display; > @@ -187,6 +189,11 @@ static int dp_add_event(struct dp_display_private *dp_priv, u32 event, > int pndx; > > spin_lock_irqsave(&dp_priv->event_lock, flag); > + if (atomic_read(&dp_priv->suspended)) { > + spin_unlock_irqrestore(&dp_priv->event_lock, flag); > + return -EPERM; > + } > + > pndx = dp_priv->event_pndx + 1; > pndx %= DP_EVENT_Q_MAX; > if (pndx == dp_priv->event_gndx) { > @@ -1362,6 +1369,8 @@ static int dp_pm_resume(struct device *dev) > dp->dp_display.connector_type, dp->core_initialized, > dp->phy_initialized, dp_display->power_on); > > + atomic_set(&dp->suspended, 0); > + > /* start from disconnected state */ > dp->hpd_state = ST_DISCONNECTED; > > @@ -1431,6 +1440,8 @@ static int dp_pm_suspend(struct device *dev) > dp->dp_display.connector_type, dp->core_initialized, > dp->phy_initialized, dp_display->power_on); > > + atomic_inc(&dp->suspended); > + > /* mainlink enabled */ > if (dp_power_clk_status(dp->power, DP_CTRL_PM)) > dp_ctrl_off_link_stream(dp->ctrl);
Quoting Dmitry Baryshkov (2022-05-11 17:41:50) > On 12/05/2022 03:02, Kuogee Hsieh wrote: > > diff --git a/drivers/gpu/drm/msm/dp/dp_ctrl.c b/drivers/gpu/drm/msm/dp/dp_ctrl.c > > index af7a80c..f3e333e 100644 > > --- a/drivers/gpu/drm/msm/dp/dp_ctrl.c > > +++ b/drivers/gpu/drm/msm/dp/dp_ctrl.c > > @@ -1389,8 +1389,13 @@ void dp_ctrl_reset_irq_ctrl(struct dp_ctrl *dp_ctrl, bool enable) > > > > dp_catalog_ctrl_reset(ctrl->catalog); > > > > - if (enable) > > - dp_catalog_ctrl_enable_irq(ctrl->catalog, enable); > > + /* > > + * all dp controller programmable registers will not > > + * be reset to default value after DP_SW_RESET > > + * therefore interrupt mask bits have to be updated > > + * to enable/disable interrupts > > + */ > > + dp_catalog_ctrl_enable_irq(ctrl->catalog, enable); > > } > > > > void dp_ctrl_phy_init(struct dp_ctrl *dp_ctrl) > > diff --git a/drivers/gpu/drm/msm/dp/dp_display.c b/drivers/gpu/drm/msm/dp/dp_display.c > > index c388323..79439b8 100644 > > --- a/drivers/gpu/drm/msm/dp/dp_display.c > > +++ b/drivers/gpu/drm/msm/dp/dp_display.c > > @@ -98,6 +98,8 @@ struct dp_display_private { > > struct dp_ctrl *ctrl; > > struct dp_debug *debug; > > > > + atomic_t suspended; > > I think it'd be better to protect it with event_lock rather than using > atomics. Agreed. I think the concern is that the event queue will have "stuff" in it. If the event queue was all a threaded irq we could simply call synchronize_irq() after disabling the irq bit in the DP hardware and then we would know it is safe to power down the DP logic. Unfortunately the event queue is a kthread so we can't do that and we have to rewrite synchronize_irq() by checking that the event queue is empty and waiting for it to empty out otherwise. It's not safe enough to simply do the power operations underneath the event_lock because there's a queue in the kthread that might be waiting to grab the event_lock to process.
On Thu, 12 May 2022 at 04:01, Stephen Boyd <swboyd@chromium.org> wrote: > > Quoting Dmitry Baryshkov (2022-05-11 17:41:50) > > On 12/05/2022 03:02, Kuogee Hsieh wrote: > > > diff --git a/drivers/gpu/drm/msm/dp/dp_ctrl.c b/drivers/gpu/drm/msm/dp/dp_ctrl.c > > > index af7a80c..f3e333e 100644 > > > --- a/drivers/gpu/drm/msm/dp/dp_ctrl.c > > > +++ b/drivers/gpu/drm/msm/dp/dp_ctrl.c > > > @@ -1389,8 +1389,13 @@ void dp_ctrl_reset_irq_ctrl(struct dp_ctrl *dp_ctrl, bool enable) > > > > > > dp_catalog_ctrl_reset(ctrl->catalog); > > > > > > - if (enable) > > > - dp_catalog_ctrl_enable_irq(ctrl->catalog, enable); > > > + /* > > > + * all dp controller programmable registers will not > > > + * be reset to default value after DP_SW_RESET > > > + * therefore interrupt mask bits have to be updated > > > + * to enable/disable interrupts > > > + */ > > > + dp_catalog_ctrl_enable_irq(ctrl->catalog, enable); > > > } > > > > > > void dp_ctrl_phy_init(struct dp_ctrl *dp_ctrl) > > > diff --git a/drivers/gpu/drm/msm/dp/dp_display.c b/drivers/gpu/drm/msm/dp/dp_display.c > > > index c388323..79439b8 100644 > > > --- a/drivers/gpu/drm/msm/dp/dp_display.c > > > +++ b/drivers/gpu/drm/msm/dp/dp_display.c > > > @@ -98,6 +98,8 @@ struct dp_display_private { > > > struct dp_ctrl *ctrl; > > > struct dp_debug *debug; > > > > > > + atomic_t suspended; > > > > I think it'd be better to protect it with event_lock rather than using > > atomics. > > Agreed. I think the concern is that the event queue will have "stuff" in > it. If the event queue was all a threaded irq we could simply call > synchronize_irq() after disabling the irq bit in the DP hardware and > then we would know it is safe to power down the DP logic. Unfortunately > the event queue is a kthread so we can't do that and we have to rewrite > synchronize_irq() by checking that the event queue is empty and waiting > for it to empty out otherwise. It's not safe enough to simply do the > power operations underneath the event_lock because there's a queue in > the kthread that might be waiting to grab the event_lock to process. This sounds like a good reason to rewrite event_thread to use threaded_irq and/or workqueue.
On 5/11/2022 6:03 PM, Dmitry Baryshkov wrote: > On Thu, 12 May 2022 at 04:01, Stephen Boyd <swboyd@chromium.org> wrote: >> Quoting Dmitry Baryshkov (2022-05-11 17:41:50) >>> On 12/05/2022 03:02, Kuogee Hsieh wrote: >>>> diff --git a/drivers/gpu/drm/msm/dp/dp_ctrl.c b/drivers/gpu/drm/msm/dp/dp_ctrl.c >>>> index af7a80c..f3e333e 100644 >>>> --- a/drivers/gpu/drm/msm/dp/dp_ctrl.c >>>> +++ b/drivers/gpu/drm/msm/dp/dp_ctrl.c >>>> @@ -1389,8 +1389,13 @@ void dp_ctrl_reset_irq_ctrl(struct dp_ctrl *dp_ctrl, bool enable) >>>> >>>> dp_catalog_ctrl_reset(ctrl->catalog); >>>> >>>> - if (enable) >>>> - dp_catalog_ctrl_enable_irq(ctrl->catalog, enable); >>>> + /* >>>> + * all dp controller programmable registers will not >>>> + * be reset to default value after DP_SW_RESET >>>> + * therefore interrupt mask bits have to be updated >>>> + * to enable/disable interrupts >>>> + */ >>>> + dp_catalog_ctrl_enable_irq(ctrl->catalog, enable); >>>> } >>>> >>>> void dp_ctrl_phy_init(struct dp_ctrl *dp_ctrl) >>>> diff --git a/drivers/gpu/drm/msm/dp/dp_display.c b/drivers/gpu/drm/msm/dp/dp_display.c >>>> index c388323..79439b8 100644 >>>> --- a/drivers/gpu/drm/msm/dp/dp_display.c >>>> +++ b/drivers/gpu/drm/msm/dp/dp_display.c >>>> @@ -98,6 +98,8 @@ struct dp_display_private { >>>> struct dp_ctrl *ctrl; >>>> struct dp_debug *debug; >>>> >>>> + atomic_t suspended; >>> I think it'd be better to protect it with event_lock rather than using >>> atomics. >> Agreed. I think the concern is that the event queue will have "stuff" in >> it. If the event queue was all a threaded irq we could simply call >> synchronize_irq() after disabling the irq bit in the DP hardware and >> then we would know it is safe to power down the DP logic. Unfortunately >> the event queue is a kthread so we can't do that and we have to rewrite >> synchronize_irq() by checking that the event queue is empty and waiting >> for it to empty out otherwise. It's not safe enough to simply do the >> power operations underneath the event_lock because there's a queue in >> the kthread that might be waiting to grab the event_lock to process. > This sounds like a good reason to rewrite event_thread to use > threaded_irq and/or workqueue. ok, i will do 1) protect suspended flag with event_lock to prevent new event be added 2) disable interrupts 2) wait for event_q empty before turn off power
On 5/11/2022 6:03 PM, Dmitry Baryshkov wrote: > On Thu, 12 May 2022 at 04:01, Stephen Boyd <swboyd@chromium.org> wrote: >> Quoting Dmitry Baryshkov (2022-05-11 17:41:50) >>> On 12/05/2022 03:02, Kuogee Hsieh wrote: >>>> diff --git a/drivers/gpu/drm/msm/dp/dp_ctrl.c b/drivers/gpu/drm/msm/dp/dp_ctrl.c >>>> index af7a80c..f3e333e 100644 >>>> --- a/drivers/gpu/drm/msm/dp/dp_ctrl.c >>>> +++ b/drivers/gpu/drm/msm/dp/dp_ctrl.c >>>> @@ -1389,8 +1389,13 @@ void dp_ctrl_reset_irq_ctrl(struct dp_ctrl *dp_ctrl, bool enable) >>>> >>>> dp_catalog_ctrl_reset(ctrl->catalog); >>>> >>>> - if (enable) >>>> - dp_catalog_ctrl_enable_irq(ctrl->catalog, enable); >>>> + /* >>>> + * all dp controller programmable registers will not >>>> + * be reset to default value after DP_SW_RESET >>>> + * therefore interrupt mask bits have to be updated >>>> + * to enable/disable interrupts >>>> + */ >>>> + dp_catalog_ctrl_enable_irq(ctrl->catalog, enable); >>>> } >>>> >>>> void dp_ctrl_phy_init(struct dp_ctrl *dp_ctrl) >>>> diff --git a/drivers/gpu/drm/msm/dp/dp_display.c b/drivers/gpu/drm/msm/dp/dp_display.c >>>> index c388323..79439b8 100644 >>>> --- a/drivers/gpu/drm/msm/dp/dp_display.c >>>> +++ b/drivers/gpu/drm/msm/dp/dp_display.c >>>> @@ -98,6 +98,8 @@ struct dp_display_private { >>>> struct dp_ctrl *ctrl; >>>> struct dp_debug *debug; >>>> >>>> + atomic_t suspended; >>> I think it'd be better to protect it with event_lock rather than using >>> atomics. >> Agreed. I think the concern is that the event queue will have "stuff" in >> it. If the event queue was all a threaded irq we could simply call >> synchronize_irq() after disabling the irq bit in the DP hardware and >> then we would know it is safe to power down the DP logic. Unfortunately >> the event queue is a kthread so we can't do that and we have to rewrite >> synchronize_irq() by checking that the event queue is empty and waiting >> for it to empty out otherwise. It's not safe enough to simply do the >> power operations underneath the event_lock because there's a queue in >> the kthread that might be waiting to grab the event_lock to process. > This sounds like a good reason to rewrite event_thread to use > threaded_irq and/or workqueue. I think we are facing two problems, 1) event q is not empty after suspend (this scenario most likely will not happen since display is off already) -- anyway it should be fixed by adding "suspended" flag checking 2) new events add after suspend due to irq mask bits were not cleared (this scenario most likely the major culprit) -- this fixed by remove "enable" flag check at dp_ctrl_reset_irq_ctrl(). I will have "suspended" flag protected by event_lock. >
diff --git a/drivers/gpu/drm/msm/dp/dp_ctrl.c b/drivers/gpu/drm/msm/dp/dp_ctrl.c index af7a80c..f3e333e 100644 --- a/drivers/gpu/drm/msm/dp/dp_ctrl.c +++ b/drivers/gpu/drm/msm/dp/dp_ctrl.c @@ -1389,8 +1389,13 @@ void dp_ctrl_reset_irq_ctrl(struct dp_ctrl *dp_ctrl, bool enable) dp_catalog_ctrl_reset(ctrl->catalog); - if (enable) - dp_catalog_ctrl_enable_irq(ctrl->catalog, enable); + /* + * all dp controller programmable registers will not + * be reset to default value after DP_SW_RESET + * therefore interrupt mask bits have to be updated + * to enable/disable interrupts + */ + dp_catalog_ctrl_enable_irq(ctrl->catalog, enable); } void dp_ctrl_phy_init(struct dp_ctrl *dp_ctrl) diff --git a/drivers/gpu/drm/msm/dp/dp_display.c b/drivers/gpu/drm/msm/dp/dp_display.c index c388323..79439b8 100644 --- a/drivers/gpu/drm/msm/dp/dp_display.c +++ b/drivers/gpu/drm/msm/dp/dp_display.c @@ -98,6 +98,8 @@ struct dp_display_private { struct dp_ctrl *ctrl; struct dp_debug *debug; + atomic_t suspended; + struct dp_usbpd_cb usbpd_cb; struct dp_display_mode dp_mode; struct msm_dp dp_display; @@ -187,6 +189,11 @@ static int dp_add_event(struct dp_display_private *dp_priv, u32 event, int pndx; spin_lock_irqsave(&dp_priv->event_lock, flag); + if (atomic_read(&dp_priv->suspended)) { + spin_unlock_irqrestore(&dp_priv->event_lock, flag); + return -EPERM; + } + pndx = dp_priv->event_pndx + 1; pndx %= DP_EVENT_Q_MAX; if (pndx == dp_priv->event_gndx) { @@ -1362,6 +1369,8 @@ static int dp_pm_resume(struct device *dev) dp->dp_display.connector_type, dp->core_initialized, dp->phy_initialized, dp_display->power_on); + atomic_set(&dp->suspended, 0); + /* start from disconnected state */ dp->hpd_state = ST_DISCONNECTED; @@ -1431,6 +1440,8 @@ static int dp_pm_suspend(struct device *dev) dp->dp_display.connector_type, dp->core_initialized, dp->phy_initialized, dp_display->power_on); + atomic_inc(&dp->suspended); + /* mainlink enabled */ if (dp_power_clk_status(dp->power, DP_CTRL_PM)) dp_ctrl_off_link_stream(dp->ctrl);
dp_catalog_ctrl_reset() will software reset DP controller. But it will not reset programmable registers to default value. DP driver still have to clear mask bits to interrupt status registers to disable interrupts after software reset of controller. At current implementation, dp_ctrl_reset_irq_ctrl() will software reset dp controller but did not call dp_catalog_ctrl_enable_irq(false) to clear hpd related interrupt mask bits to disable hpd related interrupts due to it mistakenly think hpd related interrupt mask bits will be cleared by software reset of dp controller automatically. This mistake may cause system to crash during suspending procedure due to unexpected irq fired and trigger event thread to access dp controller registers with controller clocks are disabled. This patch fixes system crash during suspending problem by removing "enable" flag condition checking at dp_ctrl_reset_irq_ctrl() so that hpd related interrupt mask bits are cleared to prevent unexpected from happening. In addition, this patch also add suspended flag to prevent new events be added into event Q to wake up event thread after system suspended. Changes in v2: -- add more details commit text Changes in v3: -- add synchrons_irq() -- add atomic_t suspended Changes in v4: -- correct Fixes's commit ID -- remove synchrons_irq() Changes in v5: -- revise commit text Fixes: 989ebe7bc446 ("drm/msm/dp: do not initialize phy until plugin interrupt received") Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com> --- drivers/gpu/drm/msm/dp/dp_ctrl.c | 9 +++++++-- drivers/gpu/drm/msm/dp/dp_display.c | 11 +++++++++++ 2 files changed, 18 insertions(+), 2 deletions(-)