diff mbox series

[v3] drm/msm/dp: check hpd_state before push idle pattern at dp_bridge_disable()

Message ID 1660159551-13828-1-git-send-email-quic_khsieh@quicinc.com (mailing list archive)
State Not Applicable
Headers show
Series [v3] drm/msm/dp: check hpd_state before push idle pattern at dp_bridge_disable() | expand

Commit Message

Kuogee Hsieh Aug. 10, 2022, 7:25 p.m. UTC
dp_bridge_disable() is the first step toward tearing down main link.
Its major function is to start transmitting idle pattern to replace
video stream. This patch will check hpd_state to make sure main link
is enabled before commit changes of main link's configuration to
push idle pattern out to avoid system crashing due to main link clock
is disabled while access main link registers.

SError Interrupt on CPU7, code 0x00000000be000411 -- SError
CPU: 7 PID: 3878 Comm: Xorg Not tainted 5.19.0-stb-cbq #19
Hardware name: Google Lazor (rev3 - 8) (DT)
pstate: a04000c9 (NzCv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : __cmpxchg_case_acq_32+0x14/0x2c
lr : do_raw_spin_lock+0xa4/0xdc
sp : ffffffc01092b6a0
x29: ffffffc01092b6a0 x28: 0000000000000028 x27: 0000000000000038
x26: 0000000000000004 x25: ffffffd2973dce48 x24: 0000000000000000
x23: 00000000ffffffff x22: 00000000ffffffff x21: ffffffd2978d0008
x20: ffffffd2978d0008 x19: ffffff80ff759fc0 x18: 0000000000000000
x17: 004800a501260460 x16: 0441043b04600438 x15: 04380000089807d0
x14: 07b0089807800780 x13: 0000000000000000 x12: 0000000000000000
x11: 0000000000000438 x10: 00000000000007d0 x9 : ffffffd2973e09e4
x8 : ffffff8092d53300 x7 : ffffff808902e8b8 x6 : 0000000000000001
x5 : ffffff808902e880 x4 : 0000000000000000 x3 : ffffff80ff759fc0
x2 : 0000000000000001 x1 : 0000000000000000 x0 : ffffff80ff759fc0
Kernel panic - not syncing: Asynchronous SError Interrupt
CPU: 7 PID: 3878 Comm: Xorg Not tainted 5.19.0-stb-cbq #19
Hardware name: Google Lazor (rev3 - 8) (DT)
Call trace:
 dump_backtrace.part.0+0xbc/0xe4
 show_stack+0x24/0x70
 dump_stack_lvl+0x68/0x84
 dump_stack+0x18/0x34
 panic+0x14c/0x32c
 nmi_panic+0x58/0x7c
 arm64_serror_panic+0x78/0x84
 do_serror+0x40/0x64
 el1h_64_error_handler+0x30/0x48
 el1h_64_error+0x68/0x6c
 __cmpxchg_case_acq_32+0x14/0x2c
 _raw_spin_lock_irqsave+0x38/0x4c
 lock_timer_base+0x40/0x78
 __mod_timer+0xf4/0x25c
 schedule_timeout+0xd4/0xfc
 __wait_for_common+0xac/0x140
 wait_for_completion_timeout+0x2c/0x54
 dp_ctrl_push_idle+0x40/0x88
 dp_bridge_disable+0x24/0x30
 drm_atomic_bridge_chain_disable+0x90/0xbc
 drm_atomic_helper_commit_modeset_disables+0x198/0x444
 msm_atomic_commit_tail+0x1d0/0x374
 commit_tail+0x80/0x108
 drm_atomic_helper_commit+0x118/0x11c
 drm_atomic_commit+0xb4/0xe0
 drm_client_modeset_commit_atomic+0x184/0x224
 drm_client_modeset_commit_locked+0x58/0x160
 drm_client_modeset_commit+0x3c/0x64
 __drm_fb_helper_restore_fbdev_mode_unlocked+0x98/0xac
 drm_fb_helper_set_par+0x74/0x80
 drm_fb_helper_hotplug_event+0xdc/0xe0
 __drm_fb_helper_restore_fbdev_mode_unlocked+0x7c/0xac
 drm_fb_helper_restore_fbdev_mode_unlocked+0x20/0x2c
 drm_fb_helper_lastclose+0x20/0x2c
 drm_lastclose+0x44/0x6c
 drm_release+0x88/0xd4
 __fput+0x104/0x220
 ____fput+0x1c/0x28
 task_work_run+0x8c/0x100
 do_exit+0x450/0x8d0
 do_group_exit+0x40/0xac
 __wake_up_parent+0x0/0x38
 invoke_syscall+0x84/0x11c
 el0_svc_common.constprop.0+0xb8/0xe4
 do_el0_svc+0x8c/0xb8
 el0_svc+0x2c/0x54
 el0t_64_sync_handler+0x120/0x1c0
 el0t_64_sync+0x190/0x194
SMP: stopping secondary CPUs
Kernel Offset: 0x128e800000 from 0xffffffc008000000
PHYS_OFFSET: 0x80000000
CPU features: 0x800,00c2a015,19801c82
Memory Limit: none

Changes in v3:
-- correct Reported-by
-- add call stack trace

Changes in v2:
-- changes Fixes patch
-- fix eported-by
-- add Closes tag

Fixes: 375a126090b9 ("drm/msm/dp: tear down main link at unplug handle immediately")
Reported-by: Leonard Lausen <leonard@lausen.nl>
Closes: https://gitlab.freedesktop.org/drm/msm/-/issues/17
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
---
 drivers/gpu/drm/msm/dp/dp_display.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

Comments

Stephen Boyd Aug. 10, 2022, 10:22 p.m. UTC | #1
Quoting Kuogee Hsieh (2022-08-10 12:25:51)
> diff --git a/drivers/gpu/drm/msm/dp/dp_display.c b/drivers/gpu/drm/msm/dp/dp_display.c
> index b36f8b6..678289a 100644
> --- a/drivers/gpu/drm/msm/dp/dp_display.c
> +++ b/drivers/gpu/drm/msm/dp/dp_display.c
> @@ -1729,10 +1729,20 @@ void dp_bridge_disable(struct drm_bridge *drm_bridge)
>         struct msm_dp_bridge *dp_bridge = to_dp_bridge(drm_bridge);
>         struct msm_dp *dp = dp_bridge->dp_display;
>         struct dp_display_private *dp_display;
> +       u32 state;
>
>         dp_display = container_of(dp, struct dp_display_private, dp_display);
>
> +       mutex_lock(&dp_display->event_mutex);
> +
> +       state = dp_display->hpd_state;
> +       if (state != ST_DISCONNECT_PENDING && state != ST_CONNECTED) {

It's concerning that we have to check this at all. Are we still
interjecting into the disable path when the cable is disconnected?

> +               mutex_unlock(&dp_display->event_mutex);
> +               return;
> +       }
> +
>         dp_ctrl_push_idle(dp_display->ctrl);
> +       mutex_unlock(&dp_display->event_mutex);
Kuogee Hsieh Aug. 10, 2022, 11:57 p.m. UTC | #2
On 8/10/2022 3:22 PM, Stephen Boyd wrote:
> Quoting Kuogee Hsieh (2022-08-10 12:25:51)
>> diff --git a/drivers/gpu/drm/msm/dp/dp_display.c b/drivers/gpu/drm/msm/dp/dp_display.c
>> index b36f8b6..678289a 100644
>> --- a/drivers/gpu/drm/msm/dp/dp_display.c
>> +++ b/drivers/gpu/drm/msm/dp/dp_display.c
>> @@ -1729,10 +1729,20 @@ void dp_bridge_disable(struct drm_bridge *drm_bridge)
>>          struct msm_dp_bridge *dp_bridge = to_dp_bridge(drm_bridge);
>>          struct msm_dp *dp = dp_bridge->dp_display;
>>          struct dp_display_private *dp_display;
>> +       u32 state;
>>
>>          dp_display = container_of(dp, struct dp_display_private, dp_display);
>>
>> +       mutex_lock(&dp_display->event_mutex);
>> +
>> +       state = dp_display->hpd_state;
>> +       if (state != ST_DISCONNECT_PENDING && state != ST_CONNECTED) {
> It's concerning that we have to check this at all. Are we still
> interjecting into the disable path when the cable is disconnected?

yes,

The problem is not from cable disconnected.

There is a corner case that this function is called at drm shutdown 
(drm_release).

At that time, mainlink is not enabled, hence dp_ctrl_push_idle() will 
cause system crash.



>> +               mutex_unlock(&dp_display->event_mutex);
>> +               return;
>> +       }
>> +
>>          dp_ctrl_push_idle(dp_display->ctrl);
>> +       mutex_unlock(&dp_display->event_mutex);
Stephen Boyd Aug. 11, 2022, 12:09 a.m. UTC | #3
Quoting Kuogee Hsieh (2022-08-10 16:57:51)
>
> On 8/10/2022 3:22 PM, Stephen Boyd wrote:
> > Quoting Kuogee Hsieh (2022-08-10 12:25:51)
> >> diff --git a/drivers/gpu/drm/msm/dp/dp_display.c b/drivers/gpu/drm/msm/dp/dp_display.c
> >> index b36f8b6..678289a 100644
> >> --- a/drivers/gpu/drm/msm/dp/dp_display.c
> >> +++ b/drivers/gpu/drm/msm/dp/dp_display.c
> >> @@ -1729,10 +1729,20 @@ void dp_bridge_disable(struct drm_bridge *drm_bridge)
> >>          struct msm_dp_bridge *dp_bridge = to_dp_bridge(drm_bridge);
> >>          struct msm_dp *dp = dp_bridge->dp_display;
> >>          struct dp_display_private *dp_display;
> >> +       u32 state;
> >>
> >>          dp_display = container_of(dp, struct dp_display_private, dp_display);
> >>
> >> +       mutex_lock(&dp_display->event_mutex);
> >> +
> >> +       state = dp_display->hpd_state;
> >> +       if (state != ST_DISCONNECT_PENDING && state != ST_CONNECTED) {
> > It's concerning that we have to check this at all. Are we still
> > interjecting into the disable path when the cable is disconnected?
>
> yes,
>
> The problem is not from cable disconnected.
>
> There is a corner case that this function is called at drm shutdown
> (drm_release).
>
> At that time, mainlink is not enabled, hence dp_ctrl_push_idle() will
> cause system crash.

The mainlink is only disabled when the cable is disconnected though?

Let me put it this way, if we have to check that the state is
"connected" or "disconnected pending" in the disable path then there's
an issue where this driver is being called in unexpected ways. This
driver is fighting the drm core each time there's a state check. We
really need to get rid of the state tracking entirely, and make sure
that the drm core is calling into the driver at the right time, i.e.
bridge disable is only called when the mainlink is enabled, etc.
Abhinav Kumar Aug. 11, 2022, 1 a.m. UTC | #4
Hi Stephen

On 8/10/2022 5:09 PM, Stephen Boyd wrote:
> Quoting Kuogee Hsieh (2022-08-10 16:57:51)
>>
>> On 8/10/2022 3:22 PM, Stephen Boyd wrote:
>>> Quoting Kuogee Hsieh (2022-08-10 12:25:51)
>>>> diff --git a/drivers/gpu/drm/msm/dp/dp_display.c b/drivers/gpu/drm/msm/dp/dp_display.c
>>>> index b36f8b6..678289a 100644
>>>> --- a/drivers/gpu/drm/msm/dp/dp_display.c
>>>> +++ b/drivers/gpu/drm/msm/dp/dp_display.c
>>>> @@ -1729,10 +1729,20 @@ void dp_bridge_disable(struct drm_bridge *drm_bridge)
>>>>           struct msm_dp_bridge *dp_bridge = to_dp_bridge(drm_bridge);
>>>>           struct msm_dp *dp = dp_bridge->dp_display;
>>>>           struct dp_display_private *dp_display;
>>>> +       u32 state;
>>>>
>>>>           dp_display = container_of(dp, struct dp_display_private, dp_display);
>>>>
>>>> +       mutex_lock(&dp_display->event_mutex);
>>>> +
>>>> +       state = dp_display->hpd_state;
>>>> +       if (state != ST_DISCONNECT_PENDING && state != ST_CONNECTED) {
>>> It's concerning that we have to check this at all. Are we still
>>> interjecting into the disable path when the cable is disconnected?
>>
>> yes,
>>
>> The problem is not from cable disconnected.
>>
>> There is a corner case that this function is called at drm shutdown
>> (drm_release).
>>
>> At that time, mainlink is not enabled, hence dp_ctrl_push_idle() will
>> cause system crash.
> 
> The mainlink is only disabled when the cable is disconnected though?
> 
> Let me put it this way, if we have to check that the state is
> "connected" or "disconnected pending" in the disable path then there's
> an issue where this driver is being called in unexpected ways. This
> driver is fighting the drm core each time there's a state check. We
> really need to get rid of the state tracking entirely, and make sure
> that the drm core is calling into the driver at the right time, i.e.
> bridge disable is only called when the mainlink is enabled, etc.

So if link training failed, we do not send a uevent to usermode and will 
bail out here:

         rc = dp_ctrl_on_link(dp->ctrl);
         if (rc) {
                 DRM_ERROR("failed to complete DP link training\n");
                 goto end;
         }

So this commit is not coming from usermode but from the drm_release() path.

Even then, you do have a valid point. DRM framework should not have 
caused the disable path to happen without an enable.

I went through the stack mentioned in the issue.

Lets see this part of the stack:

dp_ctrl_push_idle+0x40/0x88
  dp_bridge_disable+0x24/0x30
  drm_atomic_bridge_chain_disable+0x90/0xbc
  drm_atomic_helper_commit_modeset_disables+0x198/0x444
  msm_atomic_commit_tail+0x1d0/0x374

In drm_atomic_helper_commit_modeset_disables(), we call disable_outputs().

AFAICT, this is the only place which has a protection to not call the 
disable() flow if it was not enabled here:

https://gitlab.freedesktop.org/drm/msm/-/blob/msm-next/drivers/gpu/drm/drm_atomic_helper.c#L1063

But that function is only checking crtc_state->active. Thats set by the 
usermode:

https://gitlab.freedesktop.org/drm/msm/-/blob/msm-next/drivers/gpu/drm/drm_atomic_uapi.c#L407

Now, if usermode sets that to true and then crashed it can bypass this 
check and we will crash in the location kuogee is trying to fix.

 From the issue mentioned in 
https://gitlab.freedesktop.org/drm/msm/-/issues/17, the reporter did 
mention the usermode crashed.

So this is my tentative analysis of whats happening here.

Ideally yes, we should have been protected by the location mentioned 
above in disable_outputs() but looks to me due to the above hypothesis 
its getting bypassed.

Thanks

Abhinav
Kuogee Hsieh Aug. 11, 2022, 3:20 p.m. UTC | #5
On 8/10/2022 6:00 PM, Abhinav Kumar wrote:
> Hi Stephen
>
> On 8/10/2022 5:09 PM, Stephen Boyd wrote:
>> Quoting Kuogee Hsieh (2022-08-10 16:57:51)
>>>
>>> On 8/10/2022 3:22 PM, Stephen Boyd wrote:
>>>> Quoting Kuogee Hsieh (2022-08-10 12:25:51)
>>>>> diff --git a/drivers/gpu/drm/msm/dp/dp_display.c 
>>>>> b/drivers/gpu/drm/msm/dp/dp_display.c
>>>>> index b36f8b6..678289a 100644
>>>>> --- a/drivers/gpu/drm/msm/dp/dp_display.c
>>>>> +++ b/drivers/gpu/drm/msm/dp/dp_display.c
>>>>> @@ -1729,10 +1729,20 @@ void dp_bridge_disable(struct drm_bridge 
>>>>> *drm_bridge)
>>>>>           struct msm_dp_bridge *dp_bridge = to_dp_bridge(drm_bridge);
>>>>>           struct msm_dp *dp = dp_bridge->dp_display;
>>>>>           struct dp_display_private *dp_display;
>>>>> +       u32 state;
>>>>>
>>>>>           dp_display = container_of(dp, struct dp_display_private, 
>>>>> dp_display);
>>>>>
>>>>> +       mutex_lock(&dp_display->event_mutex);
>>>>> +
>>>>> +       state = dp_display->hpd_state;
>>>>> +       if (state != ST_DISCONNECT_PENDING && state != 
>>>>> ST_CONNECTED) {
>>>> It's concerning that we have to check this at all. Are we still
>>>> interjecting into the disable path when the cable is disconnected?
>>>
>>> yes,
>>>
>>> The problem is not from cable disconnected.
>>>
>>> There is a corner case that this function is called at drm shutdown
>>> (drm_release).
>>>
>>> At that time, mainlink is not enabled, hence dp_ctrl_push_idle() will
>>> cause system crash.
>>
>> The mainlink is only disabled when the cable is disconnected though?
>>
>> Let me put it this way, if we have to check that the state is
>> "connected" or "disconnected pending" in the disable path then there's
>> an issue where this driver is being called in unexpected ways. This
>> driver is fighting the drm core each time there's a state check. We
>> really need to get rid of the state tracking entirely, and make sure
>> that the drm core is calling into the driver at the right time, i.e.
>> bridge disable is only called when the mainlink is enabled, etc.
>
> So if link training failed, we do not send a uevent to usermode and 
> will bail out here:
>
>         rc = dp_ctrl_on_link(dp->ctrl);
>         if (rc) {
>                 DRM_ERROR("failed to complete DP link training\n");
>                 goto end;
>         }
>
> So this commit is not coming from usermode but from the drm_release() 
> path.
>
> Even then, you do have a valid point. DRM framework should not have 
> caused the disable path to happen without an enable.
>
> I went through the stack mentioned in the issue.
>
> Lets see this part of the stack:
>
> dp_ctrl_push_idle+0x40/0x88
>  dp_bridge_disable+0x24/0x30
>  drm_atomic_bridge_chain_disable+0x90/0xbc
>  drm_atomic_helper_commit_modeset_disables+0x198/0x444
>  msm_atomic_commit_tail+0x1d0/0x374
>
> In drm_atomic_helper_commit_modeset_disables(), we call 
> disable_outputs().
>
> AFAICT, this is the only place which has a protection to not call the 
> disable() flow if it was not enabled here:
>
> https://gitlab.freedesktop.org/drm/msm/-/blob/msm-next/drivers/gpu/drm/drm_atomic_helper.c#L1063 
>
>
> But that function is only checking crtc_state->active. Thats set by 
> the usermode:
>
> https://gitlab.freedesktop.org/drm/msm/-/blob/msm-next/drivers/gpu/drm/drm_atomic_uapi.c#L407 
>
>
> Now, if usermode sets that to true and then crashed it can bypass this 
> check and we will crash in the location kuogee is trying to fix.
>
> From the issue mentioned in 
> https://gitlab.freedesktop.org/drm/msm/-/issues/17, the reporter did 
> mention the usermode crashed.
>
> So this is my tentative analysis of whats happening here.
>
> Ideally yes, we should have been protected by the location mentioned 
> above in disable_outputs() but looks to me due to the above hypothesis 
> its getting bypassed.
>
> Thanks
>
> Abhinav
>
>
Ii sound likes that there is a hole either at user space or drm.

But that should not cause dp_bridge_disable() at dp driver to crash.

Therefore it is properly to check hdp_state condition at 
dp_bridge_disable() to prevent it from crashing.
Stephen Boyd Aug. 15, 2022, 5:08 p.m. UTC | #6
Quoting Kuogee Hsieh (2022-08-11 08:20:01)
>
> On 8/10/2022 6:00 PM, Abhinav Kumar wrote:
> >
> > Even then, you do have a valid point. DRM framework should not have
> > caused the disable path to happen without an enable.
> >
> > I went through the stack mentioned in the issue.
> >
> > Lets see this part of the stack:
> >
> > dp_ctrl_push_idle+0x40/0x88
> >  dp_bridge_disable+0x24/0x30
> >  drm_atomic_bridge_chain_disable+0x90/0xbc
> >  drm_atomic_helper_commit_modeset_disables+0x198/0x444
> >  msm_atomic_commit_tail+0x1d0/0x374
> >
> > In drm_atomic_helper_commit_modeset_disables(), we call
> > disable_outputs().
> >
> > AFAICT, this is the only place which has a protection to not call the
> > disable() flow if it was not enabled here:
> >
> > https://gitlab.freedesktop.org/drm/msm/-/blob/msm-next/drivers/gpu/drm/drm_atomic_helper.c#L1063
> >
> >
> > But that function is only checking crtc_state->active. Thats set by
> > the usermode:
> >
> > https://gitlab.freedesktop.org/drm/msm/-/blob/msm-next/drivers/gpu/drm/drm_atomic_uapi.c#L407
> >
> >
> > Now, if usermode sets that to true and then crashed it can bypass this
> > check and we will crash in the location kuogee is trying to fix.

That seems bad, no? We don't want userspace to be able to crash and then
be able to call the disable path when enable never succeeded.

> >
> > From the issue mentioned in
> > https://gitlab.freedesktop.org/drm/msm/-/issues/17, the reporter did
> > mention the usermode crashed.
> >
> > So this is my tentative analysis of whats happening here.
> >
> > Ideally yes, we should have been protected by the location mentioned
> > above in disable_outputs() but looks to me due to the above hypothesis
> > its getting bypassed.

Can you fix the problem there? Not fixing it means that every driver out
there has to develop the same "fix", when it could be fixed in the core
one time.

Ideally drivers are simple. They configure the hardware for what the
function pointer is asking for. State management and things like that
should be pushed into the core framework so that we don't have to
duplicate that multiple times.

> >
> > Thanks
> >
> > Abhinav
> >
> >
> Ii sound likes that there is a hole either at user space or drm.
>
> But that should not cause dp_bridge_disable() at dp driver to crash.

Agreed.

>
> Therefore it is properly to check hdp_state condition at
> dp_bridge_disable() to prevent it from crashing.
>

Disagree. Userspace shouldn't be able to get drm into a wedged state.
Kuogee Hsieh Aug. 16, 2022, 4:31 p.m. UTC | #7
On 8/15/2022 10:08 AM, Stephen Boyd wrote:
> Quoting Kuogee Hsieh (2022-08-11 08:20:01)
>> On 8/10/2022 6:00 PM, Abhinav Kumar wrote:
>>> Even then, you do have a valid point. DRM framework should not have
>>> caused the disable path to happen without an enable.
>>>
>>> I went through the stack mentioned in the issue.
>>>
>>> Lets see this part of the stack:
>>>
>>> dp_ctrl_push_idle+0x40/0x88
>>>   dp_bridge_disable+0x24/0x30
>>>   drm_atomic_bridge_chain_disable+0x90/0xbc
>>>   drm_atomic_helper_commit_modeset_disables+0x198/0x444
>>>   msm_atomic_commit_tail+0x1d0/0x374
>>>
>>> In drm_atomic_helper_commit_modeset_disables(), we call
>>> disable_outputs().
>>>
>>> AFAICT, this is the only place which has a protection to not call the
>>> disable() flow if it was not enabled here:
>>>
>>> https://gitlab.freedesktop.org/drm/msm/-/blob/msm-next/drivers/gpu/drm/drm_atomic_helper.c#L1063
>>>
>>>
>>> But that function is only checking crtc_state->active. Thats set by
>>> the usermode:
>>>
>>> https://gitlab.freedesktop.org/drm/msm/-/blob/msm-next/drivers/gpu/drm/drm_atomic_uapi.c#L407
>>>
>>>
>>> Now, if usermode sets that to true and then crashed it can bypass this
>>> check and we will crash in the location kuogee is trying to fix.
> That seems bad, no? We don't want userspace to be able to crash and then
> be able to call the disable path when enable never succeeded.
>
>>>  From the issue mentioned in
>>> https://gitlab.freedesktop.org/drm/msm/-/issues/17, the reporter did
>>> mention the usermode crashed.
>>>
>>> So this is my tentative analysis of whats happening here.
>>>
>>> Ideally yes, we should have been protected by the location mentioned
>>> above in disable_outputs() but looks to me due to the above hypothesis
>>> its getting bypassed.
> Can you fix the problem there? Not fixing it means that every driver out
> there has to develop the same "fix", when it could be fixed in the core
> one time.

The reporter is running GNOME Display Manager (gdm3) instead of chrome.

We can not duplicate this problem locally. Hence we can not confirm this 
is the root cause of this bug or not.

Do you know who is more proper to investigate more into this?

> Ideally drivers are simple. They configure the hardware for what the
> function pointer is asking for. State management and things like that
> should be pushed into the core framework so that we don't have to
> duplicate that multiple times.
>
>>> Thanks
>>>
>>> Abhinav
>>>
>>>
>> Ii sound likes that there is a hole either at user space or drm.
>>
>> But that should not cause dp_bridge_disable() at dp driver to crash.
> Agreed.
>
>> Therefore it is properly to check hdp_state condition at
>> dp_bridge_disable() to prevent it from crashing.
>>
> Disagree. Userspace shouldn't be able to get drm into a wedged state.

not sure for this.

I agree if this is simple driver such as i2c which does not need to 
maintain any states during operation.

but mdp/dp is far more complexity. we really do not want to have any 
crash happen at mdp/dp in the filed.

would you please reconsider our point to add this hdp_state checking 
here to prevent any crash happen.
Abhinav Kumar Aug. 18, 2022, 4:06 p.m. UTC | #8
Hi Stephen

On 8/15/2022 10:08 AM, Stephen Boyd wrote:
> Quoting Kuogee Hsieh (2022-08-11 08:20:01)
>>
>> On 8/10/2022 6:00 PM, Abhinav Kumar wrote:
>>>
>>> Even then, you do have a valid point. DRM framework should not have
>>> caused the disable path to happen without an enable.
>>>
>>> I went through the stack mentioned in the issue.
>>>
>>> Lets see this part of the stack:
>>>
>>> dp_ctrl_push_idle+0x40/0x88
>>>   dp_bridge_disable+0x24/0x30
>>>   drm_atomic_bridge_chain_disable+0x90/0xbc
>>>   drm_atomic_helper_commit_modeset_disables+0x198/0x444
>>>   msm_atomic_commit_tail+0x1d0/0x374
>>>
>>> In drm_atomic_helper_commit_modeset_disables(), we call
>>> disable_outputs().
>>>
>>> AFAICT, this is the only place which has a protection to not call the
>>> disable() flow if it was not enabled here:
>>>
>>> https://gitlab.freedesktop.org/drm/msm/-/blob/msm-next/drivers/gpu/drm/drm_atomic_helper.c#L1063
>>>
>>>
>>> But that function is only checking crtc_state->active. Thats set by
>>> the usermode:
>>>
>>> https://gitlab.freedesktop.org/drm/msm/-/blob/msm-next/drivers/gpu/drm/drm_atomic_uapi.c#L407
>>>
>>>
>>> Now, if usermode sets that to true and then crashed it can bypass this
>>> check and we will crash in the location kuogee is trying to fix.
> 
> That seems bad, no? We don't want userspace to be able to crash and then
> be able to call the disable path when enable never succeeded.
> 
>>>
>>>  From the issue mentioned in
>>> https://gitlab.freedesktop.org/drm/msm/-/issues/17, the reporter did
>>> mention the usermode crashed.
>>>
>>> So this is my tentative analysis of whats happening here.
>>>
>>> Ideally yes, we should have been protected by the location mentioned
>>> above in disable_outputs() but looks to me due to the above hypothesis
>>> its getting bypassed.
> 
> Can you fix the problem there? Not fixing it means that every driver out
> there has to develop the same "fix", when it could be fixed in the core
> one time.
> 

As per discussion on IRC with Rob, we have pushed another fix for this 
issue 
https://lore.kernel.org/all/1660759314-28088-1-git-send-email-quic_khsieh@quicinc.com/.

So, we can drop this one in favor of the other.

Thanks

Abhinav
> Ideally drivers are simple. They configure the hardware for what the
> function pointer is asking for. State management and things like that
> should be pushed into the core framework so that we don't have to
> duplicate that multiple times.
> 
>>>
>>> Thanks
>>>
>>> Abhinav
>>>
>>>
>> Ii sound likes that there is a hole either at user space or drm.
>>
>> But that should not cause dp_bridge_disable() at dp driver to crash.
> 
> Agreed.
> 
>>
>> Therefore it is properly to check hdp_state condition at
>> dp_bridge_disable() to prevent it from crashing.
>>
> 
> Disagree. Userspace shouldn't be able to get drm into a wedged state.
diff mbox series

Patch

diff --git a/drivers/gpu/drm/msm/dp/dp_display.c b/drivers/gpu/drm/msm/dp/dp_display.c
index b36f8b6..678289a 100644
--- a/drivers/gpu/drm/msm/dp/dp_display.c
+++ b/drivers/gpu/drm/msm/dp/dp_display.c
@@ -1729,10 +1729,20 @@  void dp_bridge_disable(struct drm_bridge *drm_bridge)
 	struct msm_dp_bridge *dp_bridge = to_dp_bridge(drm_bridge);
 	struct msm_dp *dp = dp_bridge->dp_display;
 	struct dp_display_private *dp_display;
+	u32 state;
 
 	dp_display = container_of(dp, struct dp_display_private, dp_display);
 
+	mutex_lock(&dp_display->event_mutex);
+
+	state = dp_display->hpd_state;
+	if (state != ST_DISCONNECT_PENDING && state != ST_CONNECTED) {
+		mutex_unlock(&dp_display->event_mutex);
+		return;
+	}
+
 	dp_ctrl_push_idle(dp_display->ctrl);
+	mutex_unlock(&dp_display->event_mutex);
 }
 
 void dp_bridge_post_disable(struct drm_bridge *drm_bridge)