diff mbox series

[v4] drm/msm/dpu1: don't choke on disabling the writeback connector

Message ID 20241209-dpu-fix-wb-v4-1-7fe93059f9e0@linaro.org (mailing list archive)
State Not Applicable
Headers show
Series [v4] drm/msm/dpu1: don't choke on disabling the writeback connector | expand

Commit Message

Dmitry Baryshkov Dec. 9, 2024, 10:04 a.m. UTC
During suspend/resume process all connectors are explicitly disabled and
then reenabled. However resume fails because of the connector_status check:

[dpu error]connector not connected 3
[drm:drm_mode_config_helper_resume [drm_kms_helper]] *ERROR* Failed to resume (-22)

It doesn't make sense to check for the Writeback connected status (and
other drivers don't perform such check), so drop the check.

It wasn't a problem before the commit 71174f362d67 ("drm/msm/dpu: move
writeback's atomic_check to dpu_writeback.c"), since encoder's
atomic_check() is called under a different conditions that the
connector's atomic_check() (e.g. it is not called if there is no
connected CRTC or if the corresponding connector is not a part of the
new state).

Fixes: 71174f362d67 ("drm/msm/dpu: move writeback's atomic_check to dpu_writeback.c")
Cc: stable@vger.kernel.org
Reported-by: Leonard Lausen <leonard@lausen.nl>
Closes: https://gitlab.freedesktop.org/drm/msm/-/issues/57
Tested-by: Leonard Lausen <leonard@lausen.nl> # on sc7180 lazor
Reported-by: György Kurucz <me@kuruczgy.com>
Link: https://lore.kernel.org/all/b70a4d1d-f98f-4169-942c-cb9006a42b40@kuruczgy.com/
Reported-by: Johan Hovold <johan+linaro@kernel.org>
Link: https://lore.kernel.org/all/ZzyYI8KkWK36FfXf@hovoldconsulting.com/
Tested-by: György Kurucz <me@kuruczgy.com>
Reviewed-by: Johan Hovold <johan+linaro@kernel.org>
Tested-by: Johan Hovold <johan+linaro@kernel.org>
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
---
Leonard Lausen reported an issue with suspend/resume of the sc7180
devices. Fix the WB atomic check, which caused the issue.
---
Changes in v4:
- Epanded commit message (Johan)
- Link to v3: https://lore.kernel.org/r/20241208-dpu-fix-wb-v3-1-a1de69ce4a1b@linaro.org

Changes in v3:
- Rebased on top of msm-fixes
- Link to v2: https://lore.kernel.org/r/20240802-dpu-fix-wb-v2-0-7eac9eb8e895@linaro.org

Changes in v2:
- Reworked the writeback to just drop the connector->status check.
- Expanded commit message for the debugging patch.
- Link to v1: https://lore.kernel.org/r/20240709-dpu-fix-wb-v1-0-448348bfd4cb@linaro.org
---
 drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c | 3 ---
 1 file changed, 3 deletions(-)


---
base-commit: 86313a9cd152330c634b25d826a281c6a002eb77
change-id: 20240709-dpu-fix-wb-6cd57e3eb182

Best regards,

Comments

Abhinav Kumar Dec. 9, 2024, 7:54 p.m. UTC | #1
On 12/9/2024 2:04 AM, Dmitry Baryshkov wrote:
> During suspend/resume process all connectors are explicitly disabled and
> then reenabled. However resume fails because of the connector_status check:
> 
> [dpu error]connector not connected 3
> [drm:drm_mode_config_helper_resume [drm_kms_helper]] *ERROR* Failed to resume (-22)
> 
> It doesn't make sense to check for the Writeback connected status (and
> other drivers don't perform such check), so drop the check.
> 
> It wasn't a problem before the commit 71174f362d67 ("drm/msm/dpu: move
> writeback's atomic_check to dpu_writeback.c"), since encoder's
> atomic_check() is called under a different conditions that the
> connector's atomic_check() (e.g. it is not called if there is no
> connected CRTC or if the corresponding connector is not a part of the
> new state).
> 
> Fixes: 71174f362d67 ("drm/msm/dpu: move writeback's atomic_check to dpu_writeback.c")
> Cc: stable@vger.kernel.org
> Reported-by: Leonard Lausen <leonard@lausen.nl>
> Closes: https://gitlab.freedesktop.org/drm/msm/-/issues/57
> Tested-by: Leonard Lausen <leonard@lausen.nl> # on sc7180 lazor
> Reported-by: György Kurucz <me@kuruczgy.com>
> Link: https://lore.kernel.org/all/b70a4d1d-f98f-4169-942c-cb9006a42b40@kuruczgy.com/
> Reported-by: Johan Hovold <johan+linaro@kernel.org>
> Link: https://lore.kernel.org/all/ZzyYI8KkWK36FfXf@hovoldconsulting.com/
> Tested-by: György Kurucz <me@kuruczgy.com>
> Reviewed-by: Johan Hovold <johan+linaro@kernel.org>
> Tested-by: Johan Hovold <johan+linaro@kernel.org>
> Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
> ---
> Leonard Lausen reported an issue with suspend/resume of the sc7180
> devices. Fix the WB atomic check, which caused the issue.
> ---
> Changes in v4:
> - Epanded commit message (Johan)
> - Link to v3: https://lore.kernel.org/r/20241208-dpu-fix-wb-v3-1-a1de69ce4a1b@linaro.org
> 
> Changes in v3:
> - Rebased on top of msm-fixes
> - Link to v2: https://lore.kernel.org/r/20240802-dpu-fix-wb-v2-0-7eac9eb8e895@linaro.org
> 
> Changes in v2:
> - Reworked the writeback to just drop the connector->status check.
> - Expanded commit message for the debugging patch.
> - Link to v1: https://lore.kernel.org/r/20240709-dpu-fix-wb-v1-0-448348bfd4cb@linaro.org
> ---
>   drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c | 3 ---
>   1 file changed, 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c b/drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c
> index 16f144cbc0c986ee266412223d9e605b01f9fb8c..8ff496082902b1ee713e806140f39b4730ed256a 100644
> --- a/drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c
> +++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c
> @@ -42,9 +42,6 @@ static int dpu_wb_conn_atomic_check(struct drm_connector *connector,
>   	if (!conn_state || !conn_state->connector) {
>   		DPU_ERROR("invalid connector state\n");
>   		return -EINVAL;
> -	} else if (conn_state->connector->status != connector_status_connected) {
> -		DPU_ERROR("connector not connected %d\n", conn_state->connector->status);
> -		return -EINVAL;
>   	}

Can you please explain me about why the status was not already connected?

In all the places I checked (unless I missed something), if the detect() 
callback is not registered, the connector is assumed connected like below:

404 	if (funcs->detect_ctx)
405 		ret = funcs->detect_ctx(connector, ctx, force);
406 	else if (connector->funcs->detect)
407 		ret = connector->funcs->detect(connector, force);
408 	else
409 		ret = connector_status_connected;

We do not register .detect for WB as WB connector should be always 
connected.

What scenario or use-case is making the connector status to something 
other than connected?

The places which mark the connector as unknown seem to be covered by 
warnings in drm_probe_helper.c but the bug report doesnt seem to be 
hitting those.

I am wondering if there is some case in fwk resetting this to unknown 
silently (which is incorrect) and perhaps other drivers dont hit this as 
there is a .detect registered which always returns connected and MSM 
should follow that.

111 static enum drm_connector_status
112 komeda_wb_connector_detect(struct drm_connector *connector, bool force)
113 {
114 	return connector_status_connected;
115 }
116
>   
>   	crtc = conn_state->crtc;
> 
> ---
> base-commit: 86313a9cd152330c634b25d826a281c6a002eb77
> change-id: 20240709-dpu-fix-wb-6cd57e3eb182
> 
> Best regards,
Dmitry Baryshkov Dec. 9, 2024, 9:55 p.m. UTC | #2
On Mon, 9 Dec 2024 at 21:54, Abhinav Kumar <quic_abhinavk@quicinc.com> wrote:
>
>
>
> On 12/9/2024 2:04 AM, Dmitry Baryshkov wrote:
> > During suspend/resume process all connectors are explicitly disabled and
> > then reenabled. However resume fails because of the connector_status check:
> >
> > [dpu error]connector not connected 3
> > [drm:drm_mode_config_helper_resume [drm_kms_helper]] *ERROR* Failed to resume (-22)
> >
> > It doesn't make sense to check for the Writeback connected status (and
> > other drivers don't perform such check), so drop the check.
> >
> > It wasn't a problem before the commit 71174f362d67 ("drm/msm/dpu: move
> > writeback's atomic_check to dpu_writeback.c"), since encoder's
> > atomic_check() is called under a different conditions that the
> > connector's atomic_check() (e.g. it is not called if there is no
> > connected CRTC or if the corresponding connector is not a part of the
> > new state).
> >
> > Fixes: 71174f362d67 ("drm/msm/dpu: move writeback's atomic_check to dpu_writeback.c")
> > Cc: stable@vger.kernel.org
> > Reported-by: Leonard Lausen <leonard@lausen.nl>
> > Closes: https://gitlab.freedesktop.org/drm/msm/-/issues/57
> > Tested-by: Leonard Lausen <leonard@lausen.nl> # on sc7180 lazor
> > Reported-by: György Kurucz <me@kuruczgy.com>
> > Link: https://lore.kernel.org/all/b70a4d1d-f98f-4169-942c-cb9006a42b40@kuruczgy.com/
> > Reported-by: Johan Hovold <johan+linaro@kernel.org>
> > Link: https://lore.kernel.org/all/ZzyYI8KkWK36FfXf@hovoldconsulting.com/
> > Tested-by: György Kurucz <me@kuruczgy.com>
> > Reviewed-by: Johan Hovold <johan+linaro@kernel.org>
> > Tested-by: Johan Hovold <johan+linaro@kernel.org>
> > Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
> > ---
> > Leonard Lausen reported an issue with suspend/resume of the sc7180
> > devices. Fix the WB atomic check, which caused the issue.
> > ---
> > Changes in v4:
> > - Epanded commit message (Johan)
> > - Link to v3: https://lore.kernel.org/r/20241208-dpu-fix-wb-v3-1-a1de69ce4a1b@linaro.org
> >
> > Changes in v3:
> > - Rebased on top of msm-fixes
> > - Link to v2: https://lore.kernel.org/r/20240802-dpu-fix-wb-v2-0-7eac9eb8e895@linaro.org
> >
> > Changes in v2:
> > - Reworked the writeback to just drop the connector->status check.
> > - Expanded commit message for the debugging patch.
> > - Link to v1: https://lore.kernel.org/r/20240709-dpu-fix-wb-v1-0-448348bfd4cb@linaro.org
> > ---
> >   drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c | 3 ---
> >   1 file changed, 3 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c b/drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c
> > index 16f144cbc0c986ee266412223d9e605b01f9fb8c..8ff496082902b1ee713e806140f39b4730ed256a 100644
> > --- a/drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c
> > +++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c
> > @@ -42,9 +42,6 @@ static int dpu_wb_conn_atomic_check(struct drm_connector *connector,
> >       if (!conn_state || !conn_state->connector) {
> >               DPU_ERROR("invalid connector state\n");
> >               return -EINVAL;
> > -     } else if (conn_state->connector->status != connector_status_connected) {
> > -             DPU_ERROR("connector not connected %d\n", conn_state->connector->status);
> > -             return -EINVAL;
> >       }
>
> Can you please explain me about why the status was not already connected?
>
> In all the places I checked (unless I missed something), if the detect()
> callback is not registered, the connector is assumed connected like below:
>
> 404     if (funcs->detect_ctx)
> 405             ret = funcs->detect_ctx(connector, ctx, force);
> 406     else if (connector->funcs->detect)
> 407             ret = connector->funcs->detect(connector, force);
> 408     else
> 409             ret = connector_status_connected;
>
> We do not register .detect for WB as WB connector should be always
> connected.
>
> What scenario or use-case is making the connector status to something
> other than connected?
>
> The places which mark the connector as unknown seem to be covered by
> warnings in drm_probe_helper.c but the bug report doesnt seem to be
> hitting those.

Because nobody asks for the getconnector on that connector. For
example,drm_client_for_each_connector_iter() explicitly skips
WRITEBACK connectors. So, drm_client_modeset_probe() also doesn't
request ->fill_modes() on that connector.

I'm almost sure that if somebody runs a `modetest -ac` on the
unpatched kernel after boot, there will be no suspend-related issues.
In fact, I've just checked on RB5.
/sys/class/drm/card0-Writeback-1/status reports 'unknown' before and
'connected' afterwards. You can easily replicate that on your
hardware.

>
> I am wondering if there is some case in fwk resetting this to unknown
> silently (which is incorrect) and perhaps other drivers dont hit this as
> there is a .detect registered which always returns connected and MSM
> should follow that.
>
> 111 static enum drm_connector_status
> 112 komeda_wb_connector_detect(struct drm_connector *connector, bool force)
> 113 {
> 114     return connector_status_connected;
> 115 }
> 116

No, that won't help. You can add a detect() callback and verify that
simply isn't getting called. It's not resetting the connector->status,
it's nobody setting it for the first time.

> >
> >       crtc = conn_state->crtc;
> >
> > ---
> > base-commit: 86313a9cd152330c634b25d826a281c6a002eb77
> > change-id: 20240709-dpu-fix-wb-6cd57e3eb182
> >
> > Best regards,
Abhinav Kumar Dec. 9, 2024, 11:36 p.m. UTC | #3
On 12/9/2024 1:55 PM, Dmitry Baryshkov wrote:
> On Mon, 9 Dec 2024 at 21:54, Abhinav Kumar <quic_abhinavk@quicinc.com> wrote:
>>
>>
>>
>> On 12/9/2024 2:04 AM, Dmitry Baryshkov wrote:
>>> During suspend/resume process all connectors are explicitly disabled and
>>> then reenabled. However resume fails because of the connector_status check:
>>>
>>> [dpu error]connector not connected 3
>>> [drm:drm_mode_config_helper_resume [drm_kms_helper]] *ERROR* Failed to resume (-22)
>>>
>>> It doesn't make sense to check for the Writeback connected status (and
>>> other drivers don't perform such check), so drop the check.
>>>
>>> It wasn't a problem before the commit 71174f362d67 ("drm/msm/dpu: move
>>> writeback's atomic_check to dpu_writeback.c"), since encoder's
>>> atomic_check() is called under a different conditions that the
>>> connector's atomic_check() (e.g. it is not called if there is no
>>> connected CRTC or if the corresponding connector is not a part of the
>>> new state).
>>>
>>> Fixes: 71174f362d67 ("drm/msm/dpu: move writeback's atomic_check to dpu_writeback.c")
>>> Cc: stable@vger.kernel.org
>>> Reported-by: Leonard Lausen <leonard@lausen.nl>
>>> Closes: https://gitlab.freedesktop.org/drm/msm/-/issues/57
>>> Tested-by: Leonard Lausen <leonard@lausen.nl> # on sc7180 lazor
>>> Reported-by: György Kurucz <me@kuruczgy.com>
>>> Link: https://lore.kernel.org/all/b70a4d1d-f98f-4169-942c-cb9006a42b40@kuruczgy.com/
>>> Reported-by: Johan Hovold <johan+linaro@kernel.org>
>>> Link: https://lore.kernel.org/all/ZzyYI8KkWK36FfXf@hovoldconsulting.com/
>>> Tested-by: György Kurucz <me@kuruczgy.com>
>>> Reviewed-by: Johan Hovold <johan+linaro@kernel.org>
>>> Tested-by: Johan Hovold <johan+linaro@kernel.org>
>>> Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
>>> ---
>>> Leonard Lausen reported an issue with suspend/resume of the sc7180
>>> devices. Fix the WB atomic check, which caused the issue.
>>> ---
>>> Changes in v4:
>>> - Epanded commit message (Johan)
>>> - Link to v3: https://lore.kernel.org/r/20241208-dpu-fix-wb-v3-1-a1de69ce4a1b@linaro.org
>>>
>>> Changes in v3:
>>> - Rebased on top of msm-fixes
>>> - Link to v2: https://lore.kernel.org/r/20240802-dpu-fix-wb-v2-0-7eac9eb8e895@linaro.org
>>>
>>> Changes in v2:
>>> - Reworked the writeback to just drop the connector->status check.
>>> - Expanded commit message for the debugging patch.
>>> - Link to v1: https://lore.kernel.org/r/20240709-dpu-fix-wb-v1-0-448348bfd4cb@linaro.org
>>> ---
>>>    drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c | 3 ---
>>>    1 file changed, 3 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c b/drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c
>>> index 16f144cbc0c986ee266412223d9e605b01f9fb8c..8ff496082902b1ee713e806140f39b4730ed256a 100644
>>> --- a/drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c
>>> +++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c
>>> @@ -42,9 +42,6 @@ static int dpu_wb_conn_atomic_check(struct drm_connector *connector,
>>>        if (!conn_state || !conn_state->connector) {
>>>                DPU_ERROR("invalid connector state\n");
>>>                return -EINVAL;
>>> -     } else if (conn_state->connector->status != connector_status_connected) {
>>> -             DPU_ERROR("connector not connected %d\n", conn_state->connector->status);
>>> -             return -EINVAL;
>>>        }
>>
>> Can you please explain me about why the status was not already connected?
>>
>> In all the places I checked (unless I missed something), if the detect()
>> callback is not registered, the connector is assumed connected like below:
>>
>> 404     if (funcs->detect_ctx)
>> 405             ret = funcs->detect_ctx(connector, ctx, force);
>> 406     else if (connector->funcs->detect)
>> 407             ret = connector->funcs->detect(connector, force);
>> 408     else
>> 409             ret = connector_status_connected;
>>
>> We do not register .detect for WB as WB connector should be always
>> connected.
>>
>> What scenario or use-case is making the connector status to something
>> other than connected?
>>
>> The places which mark the connector as unknown seem to be covered by
>> warnings in drm_probe_helper.c but the bug report doesnt seem to be
>> hitting those.
> 
> Because nobody asks for the getconnector on that connector. For
> example,drm_client_for_each_connector_iter() explicitly skips
> WRITEBACK connectors. So, drm_client_modeset_probe() also doesn't
> request ->fill_modes() on that connector.
> 
> I'm almost sure that if somebody runs a `modetest -ac` on the
> unpatched kernel after boot, there will be no suspend-related issues.
> In fact, I've just checked on RB5.
> /sys/class/drm/card0-Writeback-1/status reports 'unknown' before and
> 'connected' afterwards. You can easily replicate that on your
> hardware.
> 

Yes this is correct, I just checked on sc7180.

It stays at unknown till we run IGT. This matches your explanation 
perfectly.

>>
>> I am wondering if there is some case in fwk resetting this to unknown
>> silently (which is incorrect) and perhaps other drivers dont hit this as
>> there is a .detect registered which always returns connected and MSM
>> should follow that.
>>
>> 111 static enum drm_connector_status
>> 112 komeda_wb_connector_detect(struct drm_connector *connector, bool force)
>> 113 {
>> 114     return connector_status_connected;
>> 115 }
>> 116
> 
> No, that won't help. You can add a detect() callback and verify that
> simply isn't getting called. It's not resetting the connector->status,
> it's nobody setting it for the first time.
> 

What we found is that drm_atomic_helper_suspend() which calls 
drm_atomic_helper_duplicate_state(), uses drm_for_each_connector_iter() 
which does not rely on the last atomic state but actually uses the 
config->connector_list which in-turn disables all connectors including WB.

Is this expected that even though WB was not really there in the last 
atomic_state before the suspend, still ended up suspending / resuming 
and thus exposing this bug?

I am  now more convinced of this change as I understand the flow better. 
But wanted to highlight above observation.

>>>
>>>        crtc = conn_state->crtc;
>>>
>>> ---
>>> base-commit: 86313a9cd152330c634b25d826a281c6a002eb77
>>> change-id: 20240709-dpu-fix-wb-6cd57e3eb182
>>>
>>> Best regards,
> 
> 
>
Dmitry Baryshkov Dec. 9, 2024, 11:51 p.m. UTC | #4
On Mon, Dec 09, 2024 at 03:36:29PM -0800, Abhinav Kumar wrote:
> 
> 
> On 12/9/2024 1:55 PM, Dmitry Baryshkov wrote:
> > On Mon, 9 Dec 2024 at 21:54, Abhinav Kumar <quic_abhinavk@quicinc.com> wrote:
> > > 
> > > 
> > > 
> > > On 12/9/2024 2:04 AM, Dmitry Baryshkov wrote:
> > > > During suspend/resume process all connectors are explicitly disabled and
> > > > then reenabled. However resume fails because of the connector_status check:
> > > > 
> > > > [dpu error]connector not connected 3
> > > > [drm:drm_mode_config_helper_resume [drm_kms_helper]] *ERROR* Failed to resume (-22)
> > > > 
> > > > It doesn't make sense to check for the Writeback connected status (and
> > > > other drivers don't perform such check), so drop the check.
> > > > 
> > > > It wasn't a problem before the commit 71174f362d67 ("drm/msm/dpu: move
> > > > writeback's atomic_check to dpu_writeback.c"), since encoder's
> > > > atomic_check() is called under a different conditions that the
> > > > connector's atomic_check() (e.g. it is not called if there is no
> > > > connected CRTC or if the corresponding connector is not a part of the
> > > > new state).
> > > > 
> > > > Fixes: 71174f362d67 ("drm/msm/dpu: move writeback's atomic_check to dpu_writeback.c")
> > > > Cc: stable@vger.kernel.org
> > > > Reported-by: Leonard Lausen <leonard@lausen.nl>
> > > > Closes: https://gitlab.freedesktop.org/drm/msm/-/issues/57
> > > > Tested-by: Leonard Lausen <leonard@lausen.nl> # on sc7180 lazor
> > > > Reported-by: György Kurucz <me@kuruczgy.com>
> > > > Link: https://lore.kernel.org/all/b70a4d1d-f98f-4169-942c-cb9006a42b40@kuruczgy.com/
> > > > Reported-by: Johan Hovold <johan+linaro@kernel.org>
> > > > Link: https://lore.kernel.org/all/ZzyYI8KkWK36FfXf@hovoldconsulting.com/
> > > > Tested-by: György Kurucz <me@kuruczgy.com>
> > > > Reviewed-by: Johan Hovold <johan+linaro@kernel.org>
> > > > Tested-by: Johan Hovold <johan+linaro@kernel.org>
> > > > Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
> > > > ---
> > > > Leonard Lausen reported an issue with suspend/resume of the sc7180
> > > > devices. Fix the WB atomic check, which caused the issue.
> > > > ---
> > > > Changes in v4:
> > > > - Epanded commit message (Johan)
> > > > - Link to v3: https://lore.kernel.org/r/20241208-dpu-fix-wb-v3-1-a1de69ce4a1b@linaro.org
> > > > 
> > > > Changes in v3:
> > > > - Rebased on top of msm-fixes
> > > > - Link to v2: https://lore.kernel.org/r/20240802-dpu-fix-wb-v2-0-7eac9eb8e895@linaro.org
> > > > 
> > > > Changes in v2:
> > > > - Reworked the writeback to just drop the connector->status check.
> > > > - Expanded commit message for the debugging patch.
> > > > - Link to v1: https://lore.kernel.org/r/20240709-dpu-fix-wb-v1-0-448348bfd4cb@linaro.org
> > > > ---
> > > >    drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c | 3 ---
> > > >    1 file changed, 3 deletions(-)
> > > > 
> > > > diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c b/drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c
> > > > index 16f144cbc0c986ee266412223d9e605b01f9fb8c..8ff496082902b1ee713e806140f39b4730ed256a 100644
> > > > --- a/drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c
> > > > +++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c
> > > > @@ -42,9 +42,6 @@ static int dpu_wb_conn_atomic_check(struct drm_connector *connector,
> > > >        if (!conn_state || !conn_state->connector) {
> > > >                DPU_ERROR("invalid connector state\n");
> > > >                return -EINVAL;
> > > > -     } else if (conn_state->connector->status != connector_status_connected) {
> > > > -             DPU_ERROR("connector not connected %d\n", conn_state->connector->status);
> > > > -             return -EINVAL;
> > > >        }
> > > 
> > > Can you please explain me about why the status was not already connected?
> > > 
> > > In all the places I checked (unless I missed something), if the detect()
> > > callback is not registered, the connector is assumed connected like below:
> > > 
> > > 404     if (funcs->detect_ctx)
> > > 405             ret = funcs->detect_ctx(connector, ctx, force);
> > > 406     else if (connector->funcs->detect)
> > > 407             ret = connector->funcs->detect(connector, force);
> > > 408     else
> > > 409             ret = connector_status_connected;
> > > 
> > > We do not register .detect for WB as WB connector should be always
> > > connected.
> > > 
> > > What scenario or use-case is making the connector status to something
> > > other than connected?
> > > 
> > > The places which mark the connector as unknown seem to be covered by
> > > warnings in drm_probe_helper.c but the bug report doesnt seem to be
> > > hitting those.
> > 
> > Because nobody asks for the getconnector on that connector. For
> > example,drm_client_for_each_connector_iter() explicitly skips
> > WRITEBACK connectors. So, drm_client_modeset_probe() also doesn't
> > request ->fill_modes() on that connector.
> > 
> > I'm almost sure that if somebody runs a `modetest -ac` on the
> > unpatched kernel after boot, there will be no suspend-related issues.
> > In fact, I've just checked on RB5.
> > /sys/class/drm/card0-Writeback-1/status reports 'unknown' before and
> > 'connected' afterwards. You can easily replicate that on your
> > hardware.
> > 
> 
> Yes this is correct, I just checked on sc7180.
> 
> It stays at unknown till we run IGT. This matches your explanation
> perfectly.
> 
> > > 
> > > I am wondering if there is some case in fwk resetting this to unknown
> > > silently (which is incorrect) and perhaps other drivers dont hit this as
> > > there is a .detect registered which always returns connected and MSM
> > > should follow that.
> > > 
> > > 111 static enum drm_connector_status
> > > 112 komeda_wb_connector_detect(struct drm_connector *connector, bool force)
> > > 113 {
> > > 114     return connector_status_connected;
> > > 115 }
> > > 116
> > 
> > No, that won't help. You can add a detect() callback and verify that
> > simply isn't getting called. It's not resetting the connector->status,
> > it's nobody setting it for the first time.
> > 
> 
> What we found is that drm_atomic_helper_suspend() which calls
> drm_atomic_helper_duplicate_state(), uses drm_for_each_connector_iter()
> which does not rely on the last atomic state but actually uses the
> config->connector_list which in-turn disables all connectors including WB.
> 
> Is this expected that even though WB was not really there in the last
> atomic_state before the suspend, still ended up suspending / resuming and
> thus exposing this bug?

Note, atomic_state is a "patch", not a full state. Thus the described
behaviour is expected: it walks over all connectors and checks their
drm_connector_state information. Some of the connectors (if they were
not touched by the commit) might have been skipped from the last
drm_atomic_state structure.

> 
> I am  now more convinced of this change as I understand the flow better. But
> wanted to highlight above observation.
> 
> > > > 
> > > >        crtc = conn_state->crtc;
> > > > 
> > > > ---
> > > > base-commit: 86313a9cd152330c634b25d826a281c6a002eb77
> > > > change-id: 20240709-dpu-fix-wb-6cd57e3eb182
> > > > 
> > > > Best regards,
> > 
> > 
> >
Abhinav Kumar Dec. 10, 2024, 12:03 a.m. UTC | #5
On 12/9/2024 3:51 PM, Dmitry Baryshkov wrote:
> On Mon, Dec 09, 2024 at 03:36:29PM -0800, Abhinav Kumar wrote:
>>
>>
>> On 12/9/2024 1:55 PM, Dmitry Baryshkov wrote:
>>> On Mon, 9 Dec 2024 at 21:54, Abhinav Kumar <quic_abhinavk@quicinc.com> wrote:
>>>>
>>>>
>>>>
>>>> On 12/9/2024 2:04 AM, Dmitry Baryshkov wrote:
>>>>> During suspend/resume process all connectors are explicitly disabled and
>>>>> then reenabled. However resume fails because of the connector_status check:
>>>>>
>>>>> [dpu error]connector not connected 3
>>>>> [drm:drm_mode_config_helper_resume [drm_kms_helper]] *ERROR* Failed to resume (-22)
>>>>>
>>>>> It doesn't make sense to check for the Writeback connected status (and
>>>>> other drivers don't perform such check), so drop the check.
>>>>>
>>>>> It wasn't a problem before the commit 71174f362d67 ("drm/msm/dpu: move
>>>>> writeback's atomic_check to dpu_writeback.c"), since encoder's
>>>>> atomic_check() is called under a different conditions that the
>>>>> connector's atomic_check() (e.g. it is not called if there is no
>>>>> connected CRTC or if the corresponding connector is not a part of the
>>>>> new state).
>>>>>
>>>>> Fixes: 71174f362d67 ("drm/msm/dpu: move writeback's atomic_check to dpu_writeback.c")
>>>>> Cc: stable@vger.kernel.org
>>>>> Reported-by: Leonard Lausen <leonard@lausen.nl>
>>>>> Closes: https://gitlab.freedesktop.org/drm/msm/-/issues/57
>>>>> Tested-by: Leonard Lausen <leonard@lausen.nl> # on sc7180 lazor
>>>>> Reported-by: György Kurucz <me@kuruczgy.com>
>>>>> Link: https://lore.kernel.org/all/b70a4d1d-f98f-4169-942c-cb9006a42b40@kuruczgy.com/
>>>>> Reported-by: Johan Hovold <johan+linaro@kernel.org>
>>>>> Link: https://lore.kernel.org/all/ZzyYI8KkWK36FfXf@hovoldconsulting.com/
>>>>> Tested-by: György Kurucz <me@kuruczgy.com>
>>>>> Reviewed-by: Johan Hovold <johan+linaro@kernel.org>
>>>>> Tested-by: Johan Hovold <johan+linaro@kernel.org>
>>>>> Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
>>>>> ---
>>>>> Leonard Lausen reported an issue with suspend/resume of the sc7180
>>>>> devices. Fix the WB atomic check, which caused the issue.
>>>>> ---
>>>>> Changes in v4:
>>>>> - Epanded commit message (Johan)
>>>>> - Link to v3: https://lore.kernel.org/r/20241208-dpu-fix-wb-v3-1-a1de69ce4a1b@linaro.org
>>>>>
>>>>> Changes in v3:
>>>>> - Rebased on top of msm-fixes
>>>>> - Link to v2: https://lore.kernel.org/r/20240802-dpu-fix-wb-v2-0-7eac9eb8e895@linaro.org
>>>>>
>>>>> Changes in v2:
>>>>> - Reworked the writeback to just drop the connector->status check.
>>>>> - Expanded commit message for the debugging patch.
>>>>> - Link to v1: https://lore.kernel.org/r/20240709-dpu-fix-wb-v1-0-448348bfd4cb@linaro.org
>>>>> ---
>>>>>     drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c | 3 ---
>>>>>     1 file changed, 3 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c b/drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c
>>>>> index 16f144cbc0c986ee266412223d9e605b01f9fb8c..8ff496082902b1ee713e806140f39b4730ed256a 100644
>>>>> --- a/drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c
>>>>> +++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c
>>>>> @@ -42,9 +42,6 @@ static int dpu_wb_conn_atomic_check(struct drm_connector *connector,
>>>>>         if (!conn_state || !conn_state->connector) {
>>>>>                 DPU_ERROR("invalid connector state\n");
>>>>>                 return -EINVAL;
>>>>> -     } else if (conn_state->connector->status != connector_status_connected) {
>>>>> -             DPU_ERROR("connector not connected %d\n", conn_state->connector->status);
>>>>> -             return -EINVAL;
>>>>>         }
>>>>
>>>> Can you please explain me about why the status was not already connected?
>>>>
>>>> In all the places I checked (unless I missed something), if the detect()
>>>> callback is not registered, the connector is assumed connected like below:
>>>>
>>>> 404     if (funcs->detect_ctx)
>>>> 405             ret = funcs->detect_ctx(connector, ctx, force);
>>>> 406     else if (connector->funcs->detect)
>>>> 407             ret = connector->funcs->detect(connector, force);
>>>> 408     else
>>>> 409             ret = connector_status_connected;
>>>>
>>>> We do not register .detect for WB as WB connector should be always
>>>> connected.
>>>>
>>>> What scenario or use-case is making the connector status to something
>>>> other than connected?
>>>>
>>>> The places which mark the connector as unknown seem to be covered by
>>>> warnings in drm_probe_helper.c but the bug report doesnt seem to be
>>>> hitting those.
>>>
>>> Because nobody asks for the getconnector on that connector. For
>>> example,drm_client_for_each_connector_iter() explicitly skips
>>> WRITEBACK connectors. So, drm_client_modeset_probe() also doesn't
>>> request ->fill_modes() on that connector.
>>>
>>> I'm almost sure that if somebody runs a `modetest -ac` on the
>>> unpatched kernel after boot, there will be no suspend-related issues.
>>> In fact, I've just checked on RB5.
>>> /sys/class/drm/card0-Writeback-1/status reports 'unknown' before and
>>> 'connected' afterwards. You can easily replicate that on your
>>> hardware.
>>>
>>
>> Yes this is correct, I just checked on sc7180.
>>
>> It stays at unknown till we run IGT. This matches your explanation
>> perfectly.
>>
>>>>
>>>> I am wondering if there is some case in fwk resetting this to unknown
>>>> silently (which is incorrect) and perhaps other drivers dont hit this as
>>>> there is a .detect registered which always returns connected and MSM
>>>> should follow that.
>>>>
>>>> 111 static enum drm_connector_status
>>>> 112 komeda_wb_connector_detect(struct drm_connector *connector, bool force)
>>>> 113 {
>>>> 114     return connector_status_connected;
>>>> 115 }
>>>> 116
>>>
>>> No, that won't help. You can add a detect() callback and verify that
>>> simply isn't getting called. It's not resetting the connector->status,
>>> it's nobody setting it for the first time.
>>>
>>
>> What we found is that drm_atomic_helper_suspend() which calls
>> drm_atomic_helper_duplicate_state(), uses drm_for_each_connector_iter()
>> which does not rely on the last atomic state but actually uses the
>> config->connector_list which in-turn disables all connectors including WB.
>>
>> Is this expected that even though WB was not really there in the last
>> atomic_state before the suspend, still ended up suspending / resuming and
>> thus exposing this bug?
> 
> Note, atomic_state is a "patch", not a full state. Thus the described
> behaviour is expected: it walks over all connectors and checks their
> drm_connector_state information. Some of the connectors (if they were
> not touched by the commit) might have been skipped from the last
> drm_atomic_state structure.
> 

Yes, I understand the patched part. So what i meant was, we observed 
that CLIENT_CAP_WRITEBACK was never set which means WB was never exposed 
as a connector so it could not have been part of any commits. IOW, it 
would have never been enabled. In that case, is it still right to 
disable WB connector and enable it again considering that it could not 
have been enabled at any point before this.

>>
>> I am  now more convinced of this change as I understand the flow better. But
>> wanted to highlight above observation.
>>
>>>>>
>>>>>         crtc = conn_state->crtc;
>>>>>
>>>>> ---
>>>>> base-commit: 86313a9cd152330c634b25d826a281c6a002eb77
>>>>> change-id: 20240709-dpu-fix-wb-6cd57e3eb182
>>>>>
>>>>> Best regards,
>>>
>>>
>>>
>
Dmitry Baryshkov Dec. 10, 2024, 12:07 a.m. UTC | #6
On Tue, 10 Dec 2024 at 02:03, Abhinav Kumar <quic_abhinavk@quicinc.com> wrote:
>
>
>
> On 12/9/2024 3:51 PM, Dmitry Baryshkov wrote:
> > On Mon, Dec 09, 2024 at 03:36:29PM -0800, Abhinav Kumar wrote:
> >>
> >>
> >> On 12/9/2024 1:55 PM, Dmitry Baryshkov wrote:
> >>> On Mon, 9 Dec 2024 at 21:54, Abhinav Kumar <quic_abhinavk@quicinc.com> wrote:
> >>>>
> >>>>
> >>>>
> >>>> On 12/9/2024 2:04 AM, Dmitry Baryshkov wrote:
> >>>>> During suspend/resume process all connectors are explicitly disabled and
> >>>>> then reenabled. However resume fails because of the connector_status check:
> >>>>>
> >>>>> [dpu error]connector not connected 3
> >>>>> [drm:drm_mode_config_helper_resume [drm_kms_helper]] *ERROR* Failed to resume (-22)
> >>>>>
> >>>>> It doesn't make sense to check for the Writeback connected status (and
> >>>>> other drivers don't perform such check), so drop the check.
> >>>>>
> >>>>> It wasn't a problem before the commit 71174f362d67 ("drm/msm/dpu: move
> >>>>> writeback's atomic_check to dpu_writeback.c"), since encoder's
> >>>>> atomic_check() is called under a different conditions that the
> >>>>> connector's atomic_check() (e.g. it is not called if there is no
> >>>>> connected CRTC or if the corresponding connector is not a part of the
> >>>>> new state).
> >>>>>
> >>>>> Fixes: 71174f362d67 ("drm/msm/dpu: move writeback's atomic_check to dpu_writeback.c")
> >>>>> Cc: stable@vger.kernel.org
> >>>>> Reported-by: Leonard Lausen <leonard@lausen.nl>
> >>>>> Closes: https://gitlab.freedesktop.org/drm/msm/-/issues/57
> >>>>> Tested-by: Leonard Lausen <leonard@lausen.nl> # on sc7180 lazor
> >>>>> Reported-by: György Kurucz <me@kuruczgy.com>
> >>>>> Link: https://lore.kernel.org/all/b70a4d1d-f98f-4169-942c-cb9006a42b40@kuruczgy.com/
> >>>>> Reported-by: Johan Hovold <johan+linaro@kernel.org>
> >>>>> Link: https://lore.kernel.org/all/ZzyYI8KkWK36FfXf@hovoldconsulting.com/
> >>>>> Tested-by: György Kurucz <me@kuruczgy.com>
> >>>>> Reviewed-by: Johan Hovold <johan+linaro@kernel.org>
> >>>>> Tested-by: Johan Hovold <johan+linaro@kernel.org>
> >>>>> Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
> >>>>> ---
> >>>>> Leonard Lausen reported an issue with suspend/resume of the sc7180
> >>>>> devices. Fix the WB atomic check, which caused the issue.
> >>>>> ---
> >>>>> Changes in v4:
> >>>>> - Epanded commit message (Johan)
> >>>>> - Link to v3: https://lore.kernel.org/r/20241208-dpu-fix-wb-v3-1-a1de69ce4a1b@linaro.org
> >>>>>
> >>>>> Changes in v3:
> >>>>> - Rebased on top of msm-fixes
> >>>>> - Link to v2: https://lore.kernel.org/r/20240802-dpu-fix-wb-v2-0-7eac9eb8e895@linaro.org
> >>>>>
> >>>>> Changes in v2:
> >>>>> - Reworked the writeback to just drop the connector->status check.
> >>>>> - Expanded commit message for the debugging patch.
> >>>>> - Link to v1: https://lore.kernel.org/r/20240709-dpu-fix-wb-v1-0-448348bfd4cb@linaro.org
> >>>>> ---
> >>>>>     drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c | 3 ---
> >>>>>     1 file changed, 3 deletions(-)
> >>>>>
> >>>>> diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c b/drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c
> >>>>> index 16f144cbc0c986ee266412223d9e605b01f9fb8c..8ff496082902b1ee713e806140f39b4730ed256a 100644
> >>>>> --- a/drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c
> >>>>> +++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c
> >>>>> @@ -42,9 +42,6 @@ static int dpu_wb_conn_atomic_check(struct drm_connector *connector,
> >>>>>         if (!conn_state || !conn_state->connector) {
> >>>>>                 DPU_ERROR("invalid connector state\n");
> >>>>>                 return -EINVAL;
> >>>>> -     } else if (conn_state->connector->status != connector_status_connected) {
> >>>>> -             DPU_ERROR("connector not connected %d\n", conn_state->connector->status);
> >>>>> -             return -EINVAL;
> >>>>>         }
> >>>>
> >>>> Can you please explain me about why the status was not already connected?
> >>>>
> >>>> In all the places I checked (unless I missed something), if the detect()
> >>>> callback is not registered, the connector is assumed connected like below:
> >>>>
> >>>> 404     if (funcs->detect_ctx)
> >>>> 405             ret = funcs->detect_ctx(connector, ctx, force);
> >>>> 406     else if (connector->funcs->detect)
> >>>> 407             ret = connector->funcs->detect(connector, force);
> >>>> 408     else
> >>>> 409             ret = connector_status_connected;
> >>>>
> >>>> We do not register .detect for WB as WB connector should be always
> >>>> connected.
> >>>>
> >>>> What scenario or use-case is making the connector status to something
> >>>> other than connected?
> >>>>
> >>>> The places which mark the connector as unknown seem to be covered by
> >>>> warnings in drm_probe_helper.c but the bug report doesnt seem to be
> >>>> hitting those.
> >>>
> >>> Because nobody asks for the getconnector on that connector. For
> >>> example,drm_client_for_each_connector_iter() explicitly skips
> >>> WRITEBACK connectors. So, drm_client_modeset_probe() also doesn't
> >>> request ->fill_modes() on that connector.
> >>>
> >>> I'm almost sure that if somebody runs a `modetest -ac` on the
> >>> unpatched kernel after boot, there will be no suspend-related issues.
> >>> In fact, I've just checked on RB5.
> >>> /sys/class/drm/card0-Writeback-1/status reports 'unknown' before and
> >>> 'connected' afterwards. You can easily replicate that on your
> >>> hardware.
> >>>
> >>
> >> Yes this is correct, I just checked on sc7180.
> >>
> >> It stays at unknown till we run IGT. This matches your explanation
> >> perfectly.
> >>
> >>>>
> >>>> I am wondering if there is some case in fwk resetting this to unknown
> >>>> silently (which is incorrect) and perhaps other drivers dont hit this as
> >>>> there is a .detect registered which always returns connected and MSM
> >>>> should follow that.
> >>>>
> >>>> 111 static enum drm_connector_status
> >>>> 112 komeda_wb_connector_detect(struct drm_connector *connector, bool force)
> >>>> 113 {
> >>>> 114     return connector_status_connected;
> >>>> 115 }
> >>>> 116
> >>>
> >>> No, that won't help. You can add a detect() callback and verify that
> >>> simply isn't getting called. It's not resetting the connector->status,
> >>> it's nobody setting it for the first time.
> >>>
> >>
> >> What we found is that drm_atomic_helper_suspend() which calls
> >> drm_atomic_helper_duplicate_state(), uses drm_for_each_connector_iter()
> >> which does not rely on the last atomic state but actually uses the
> >> config->connector_list which in-turn disables all connectors including WB.
> >>
> >> Is this expected that even though WB was not really there in the last
> >> atomic_state before the suspend, still ended up suspending / resuming and
> >> thus exposing this bug?
> >
> > Note, atomic_state is a "patch", not a full state. Thus the described
> > behaviour is expected: it walks over all connectors and checks their
> > drm_connector_state information. Some of the connectors (if they were
> > not touched by the commit) might have been skipped from the last
> > drm_atomic_state structure.
> >
>
> Yes, I understand the patched part. So what i meant was, we observed
> that CLIENT_CAP_WRITEBACK was never set which means WB was never exposed
> as a connector so it could not have been part of any commits. IOW, it
> would have never been enabled. In that case, is it still right to
> disable WB connector and enable it again considering that it could not
> have been enabled at any point before this.

... to disable WB connector on suspend and restore its state on resume ...

Yes, it's correct behaviour. It requires less clumsy code and less
special cases compared to using some other euristics in order to
select, which connector was ever enabled.

>
> >>
> >> I am  now more convinced of this change as I understand the flow better. But
> >> wanted to highlight above observation.
> >>
> >>>>>
> >>>>>         crtc = conn_state->crtc;
> >>>>>
> >>>>> ---
> >>>>> base-commit: 86313a9cd152330c634b25d826a281c6a002eb77
> >>>>> change-id: 20240709-dpu-fix-wb-6cd57e3eb182
> >>>>>
> >>>>> Best regards,
> >>>
> >>>
> >>>
> >
Abhinav Kumar Dec. 10, 2024, 12:11 a.m. UTC | #7
On 12/9/2024 4:07 PM, Dmitry Baryshkov wrote:
> On Tue, 10 Dec 2024 at 02:03, Abhinav Kumar <quic_abhinavk@quicinc.com> wrote:
>>
>>
>>
>> On 12/9/2024 3:51 PM, Dmitry Baryshkov wrote:
>>> On Mon, Dec 09, 2024 at 03:36:29PM -0800, Abhinav Kumar wrote:
>>>>
>>>>
>>>> On 12/9/2024 1:55 PM, Dmitry Baryshkov wrote:
>>>>> On Mon, 9 Dec 2024 at 21:54, Abhinav Kumar <quic_abhinavk@quicinc.com> wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 12/9/2024 2:04 AM, Dmitry Baryshkov wrote:
>>>>>>> During suspend/resume process all connectors are explicitly disabled and
>>>>>>> then reenabled. However resume fails because of the connector_status check:
>>>>>>>
>>>>>>> [dpu error]connector not connected 3
>>>>>>> [drm:drm_mode_config_helper_resume [drm_kms_helper]] *ERROR* Failed to resume (-22)
>>>>>>>
>>>>>>> It doesn't make sense to check for the Writeback connected status (and
>>>>>>> other drivers don't perform such check), so drop the check.
>>>>>>>
>>>>>>> It wasn't a problem before the commit 71174f362d67 ("drm/msm/dpu: move
>>>>>>> writeback's atomic_check to dpu_writeback.c"), since encoder's
>>>>>>> atomic_check() is called under a different conditions that the
>>>>>>> connector's atomic_check() (e.g. it is not called if there is no
>>>>>>> connected CRTC or if the corresponding connector is not a part of the
>>>>>>> new state).
>>>>>>>
>>>>>>> Fixes: 71174f362d67 ("drm/msm/dpu: move writeback's atomic_check to dpu_writeback.c")
>>>>>>> Cc: stable@vger.kernel.org
>>>>>>> Reported-by: Leonard Lausen <leonard@lausen.nl>
>>>>>>> Closes: https://gitlab.freedesktop.org/drm/msm/-/issues/57
>>>>>>> Tested-by: Leonard Lausen <leonard@lausen.nl> # on sc7180 lazor
>>>>>>> Reported-by: György Kurucz <me@kuruczgy.com>
>>>>>>> Link: https://lore.kernel.org/all/b70a4d1d-f98f-4169-942c-cb9006a42b40@kuruczgy.com/
>>>>>>> Reported-by: Johan Hovold <johan+linaro@kernel.org>
>>>>>>> Link: https://lore.kernel.org/all/ZzyYI8KkWK36FfXf@hovoldconsulting.com/
>>>>>>> Tested-by: György Kurucz <me@kuruczgy.com>
>>>>>>> Reviewed-by: Johan Hovold <johan+linaro@kernel.org>
>>>>>>> Tested-by: Johan Hovold <johan+linaro@kernel.org>
>>>>>>> Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
>>>>>>> ---
>>>>>>> Leonard Lausen reported an issue with suspend/resume of the sc7180
>>>>>>> devices. Fix the WB atomic check, which caused the issue.
>>>>>>> ---
>>>>>>> Changes in v4:
>>>>>>> - Epanded commit message (Johan)
>>>>>>> - Link to v3: https://lore.kernel.org/r/20241208-dpu-fix-wb-v3-1-a1de69ce4a1b@linaro.org
>>>>>>>
>>>>>>> Changes in v3:
>>>>>>> - Rebased on top of msm-fixes
>>>>>>> - Link to v2: https://lore.kernel.org/r/20240802-dpu-fix-wb-v2-0-7eac9eb8e895@linaro.org
>>>>>>>
>>>>>>> Changes in v2:
>>>>>>> - Reworked the writeback to just drop the connector->status check.
>>>>>>> - Expanded commit message for the debugging patch.
>>>>>>> - Link to v1: https://lore.kernel.org/r/20240709-dpu-fix-wb-v1-0-448348bfd4cb@linaro.org
>>>>>>> ---
>>>>>>>      drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c | 3 ---
>>>>>>>      1 file changed, 3 deletions(-)
>>>>>>>
>>>>>>> diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c b/drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c
>>>>>>> index 16f144cbc0c986ee266412223d9e605b01f9fb8c..8ff496082902b1ee713e806140f39b4730ed256a 100644
>>>>>>> --- a/drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c
>>>>>>> +++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c
>>>>>>> @@ -42,9 +42,6 @@ static int dpu_wb_conn_atomic_check(struct drm_connector *connector,
>>>>>>>          if (!conn_state || !conn_state->connector) {
>>>>>>>                  DPU_ERROR("invalid connector state\n");
>>>>>>>                  return -EINVAL;
>>>>>>> -     } else if (conn_state->connector->status != connector_status_connected) {
>>>>>>> -             DPU_ERROR("connector not connected %d\n", conn_state->connector->status);
>>>>>>> -             return -EINVAL;
>>>>>>>          }
>>>>>>
>>>>>> Can you please explain me about why the status was not already connected?
>>>>>>
>>>>>> In all the places I checked (unless I missed something), if the detect()
>>>>>> callback is not registered, the connector is assumed connected like below:
>>>>>>
>>>>>> 404     if (funcs->detect_ctx)
>>>>>> 405             ret = funcs->detect_ctx(connector, ctx, force);
>>>>>> 406     else if (connector->funcs->detect)
>>>>>> 407             ret = connector->funcs->detect(connector, force);
>>>>>> 408     else
>>>>>> 409             ret = connector_status_connected;
>>>>>>
>>>>>> We do not register .detect for WB as WB connector should be always
>>>>>> connected.
>>>>>>
>>>>>> What scenario or use-case is making the connector status to something
>>>>>> other than connected?
>>>>>>
>>>>>> The places which mark the connector as unknown seem to be covered by
>>>>>> warnings in drm_probe_helper.c but the bug report doesnt seem to be
>>>>>> hitting those.
>>>>>
>>>>> Because nobody asks for the getconnector on that connector. For
>>>>> example,drm_client_for_each_connector_iter() explicitly skips
>>>>> WRITEBACK connectors. So, drm_client_modeset_probe() also doesn't
>>>>> request ->fill_modes() on that connector.
>>>>>
>>>>> I'm almost sure that if somebody runs a `modetest -ac` on the
>>>>> unpatched kernel after boot, there will be no suspend-related issues.
>>>>> In fact, I've just checked on RB5.
>>>>> /sys/class/drm/card0-Writeback-1/status reports 'unknown' before and
>>>>> 'connected' afterwards. You can easily replicate that on your
>>>>> hardware.
>>>>>
>>>>
>>>> Yes this is correct, I just checked on sc7180.
>>>>
>>>> It stays at unknown till we run IGT. This matches your explanation
>>>> perfectly.
>>>>
>>>>>>
>>>>>> I am wondering if there is some case in fwk resetting this to unknown
>>>>>> silently (which is incorrect) and perhaps other drivers dont hit this as
>>>>>> there is a .detect registered which always returns connected and MSM
>>>>>> should follow that.
>>>>>>
>>>>>> 111 static enum drm_connector_status
>>>>>> 112 komeda_wb_connector_detect(struct drm_connector *connector, bool force)
>>>>>> 113 {
>>>>>> 114     return connector_status_connected;
>>>>>> 115 }
>>>>>> 116
>>>>>
>>>>> No, that won't help. You can add a detect() callback and verify that
>>>>> simply isn't getting called. It's not resetting the connector->status,
>>>>> it's nobody setting it for the first time.
>>>>>
>>>>
>>>> What we found is that drm_atomic_helper_suspend() which calls
>>>> drm_atomic_helper_duplicate_state(), uses drm_for_each_connector_iter()
>>>> which does not rely on the last atomic state but actually uses the
>>>> config->connector_list which in-turn disables all connectors including WB.
>>>>
>>>> Is this expected that even though WB was not really there in the last
>>>> atomic_state before the suspend, still ended up suspending / resuming and
>>>> thus exposing this bug?
>>>
>>> Note, atomic_state is a "patch", not a full state. Thus the described
>>> behaviour is expected: it walks over all connectors and checks their
>>> drm_connector_state information. Some of the connectors (if they were
>>> not touched by the commit) might have been skipped from the last
>>> drm_atomic_state structure.
>>>
>>
>> Yes, I understand the patched part. So what i meant was, we observed
>> that CLIENT_CAP_WRITEBACK was never set which means WB was never exposed
>> as a connector so it could not have been part of any commits. IOW, it
>> would have never been enabled. In that case, is it still right to
>> disable WB connector and enable it again considering that it could not
>> have been enabled at any point before this.
> 
> ... to disable WB connector on suspend and restore its state on resume ...
> 
> Yes, it's correct behaviour. It requires less clumsy code and less
> special cases compared to using some other euristics in order to
> select, which connector was ever enabled.
> 

Acked..it will be harder to track this for sure, I am fine with this now

Reviewed-by: Abhinav Kumar <quic_abhinavk@quicinc.com>

FWIW, we retested kms_writeback with this and it still works as well.

>>
>>>>
>>>> I am  now more convinced of this change as I understand the flow better. But
>>>> wanted to highlight above observation.
>>>>
>>>>>>>
>>>>>>>          crtc = conn_state->crtc;
>>>>>>>
>>>>>>> ---
>>>>>>> base-commit: 86313a9cd152330c634b25d826a281c6a002eb77
>>>>>>> change-id: 20240709-dpu-fix-wb-6cd57e3eb182
>>>>>>>
>>>>>>> Best regards,
>>>>>
>>>>>
>>>>>
>>>
> 
> 
>
Jessica Zhang Dec. 10, 2024, 12:12 a.m. UTC | #8
On 12/9/2024 2:04 AM, Dmitry Baryshkov wrote:
> During suspend/resume process all connectors are explicitly disabled and
> then reenabled. However resume fails because of the connector_status check:
> 
> [dpu error]connector not connected 3
> [drm:drm_mode_config_helper_resume [drm_kms_helper]] *ERROR* Failed to resume (-22)
> 
> It doesn't make sense to check for the Writeback connected status (and
> other drivers don't perform such check), so drop the check.
> 
> It wasn't a problem before the commit 71174f362d67 ("drm/msm/dpu: move
> writeback's atomic_check to dpu_writeback.c"), since encoder's
> atomic_check() is called under a different conditions that the
> connector's atomic_check() (e.g. it is not called if there is no
> connected CRTC or if the corresponding connector is not a part of the
> new state).
> 
> Fixes: 71174f362d67 ("drm/msm/dpu: move writeback's atomic_check to dpu_writeback.c")
> Cc: stable@vger.kernel.org
> Reported-by: Leonard Lausen <leonard@lausen.nl>
> Closes: https://gitlab.freedesktop.org/drm/msm/-/issues/57
> Tested-by: Leonard Lausen <leonard@lausen.nl> # on sc7180 lazor
> Reported-by: György Kurucz <me@kuruczgy.com>
> Link: https://lore.kernel.org/all/b70a4d1d-f98f-4169-942c-cb9006a42b40@kuruczgy.com/
> Reported-by: Johan Hovold <johan+linaro@kernel.org>
> Link: https://lore.kernel.org/all/ZzyYI8KkWK36FfXf@hovoldconsulting.com/
> Tested-by: György Kurucz <me@kuruczgy.com>
> Reviewed-by: Johan Hovold <johan+linaro@kernel.org>
> Tested-by: Johan Hovold <johan+linaro@kernel.org>
> Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

Tested-by: Jessica Zhang <quic_jesszhan@quicinc.com> # Trogdor (sc7180)

> ---
> Leonard Lausen reported an issue with suspend/resume of the sc7180
> devices. Fix the WB atomic check, which caused the issue.
> ---
> Changes in v4:
> - Epanded commit message (Johan)
> - Link to v3: https://lore.kernel.org/r/20241208-dpu-fix-wb-v3-1-a1de69ce4a1b@linaro.org
> 
> Changes in v3:
> - Rebased on top of msm-fixes
> - Link to v2: https://lore.kernel.org/r/20240802-dpu-fix-wb-v2-0-7eac9eb8e895@linaro.org
> 
> Changes in v2:
> - Reworked the writeback to just drop the connector->status check.
> - Expanded commit message for the debugging patch.
> - Link to v1: https://lore.kernel.org/r/20240709-dpu-fix-wb-v1-0-448348bfd4cb@linaro.org
> ---
>   drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c | 3 ---
>   1 file changed, 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c b/drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c
> index 16f144cbc0c986ee266412223d9e605b01f9fb8c..8ff496082902b1ee713e806140f39b4730ed256a 100644
> --- a/drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c
> +++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c
> @@ -42,9 +42,6 @@ static int dpu_wb_conn_atomic_check(struct drm_connector *connector,
>   	if (!conn_state || !conn_state->connector) {
>   		DPU_ERROR("invalid connector state\n");
>   		return -EINVAL;
> -	} else if (conn_state->connector->status != connector_status_connected) {
> -		DPU_ERROR("connector not connected %d\n", conn_state->connector->status);
> -		return -EINVAL;
>   	}
>   
>   	crtc = conn_state->crtc;
> 
> ---
> base-commit: 86313a9cd152330c634b25d826a281c6a002eb77
> change-id: 20240709-dpu-fix-wb-6cd57e3eb182
> 
> Best regards,
> -- 
> Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
> 
>
diff mbox series

Patch

diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c b/drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c
index 16f144cbc0c986ee266412223d9e605b01f9fb8c..8ff496082902b1ee713e806140f39b4730ed256a 100644
--- a/drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c
+++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c
@@ -42,9 +42,6 @@  static int dpu_wb_conn_atomic_check(struct drm_connector *connector,
 	if (!conn_state || !conn_state->connector) {
 		DPU_ERROR("invalid connector state\n");
 		return -EINVAL;
-	} else if (conn_state->connector->status != connector_status_connected) {
-		DPU_ERROR("connector not connected %d\n", conn_state->connector->status);
-		return -EINVAL;
 	}
 
 	crtc = conn_state->crtc;