mbox series

[RFC,0/2] drm/amd/display: switch amdgpu_dm_connector to

Message ID 20240126163429.56714-1-mwen@igalia.com (mailing list archive)
Headers show
Series drm/amd/display: switch amdgpu_dm_connector to | expand

Message

Melissa Wen Jan. 26, 2024, 4:28 p.m. UTC
Hi,

I'm debugging a null-pointer dereference when running
igt@kms_connector_force_edid and the way I found to solve the bug is to
stop using raw edid handler in amdgpu_connector_funcs_force and
create_eml_sink in favor of managing resouces via sruct drm_edid helpers
(Patch 1). The proper solution seems to be switch amdgpu_dm_connector
from struct edid to struct drm_edid and avoid the usage of different
approaches in the driver (Patch 2). However, doing it implies a good
amount of work and validation, therefore I decided to send this RFC
first to collect opinions and check if there is any parallel work on
this side. It's a working in progress.

The null-pointer error trigger by the igt@kms_connector_force_edid test
was introduced by:
- e54ed41620f ("drm/amd/display: Remove unwanted drm edid references")

You can check the error trace in the first patch.

This series was tested with kms_hdmi_inject and kms_force_connector. No
null-pointer error, kms_hdmi_inject is successul and kms_force_connector
is sucessful after the second execution - the force-edid subtest
still fails in the first run (I'm still investigating).

There is also a couple of cast warnings to be addressed - I'm looking
for the best replacement. 

I appreciate any feedback and testing.

Melissa

Melissa Wen (2):
  drm/amd/display: fix null-pointer dereference on edid reading
  drm/amd/display: switch amdgpu_dm_connector to use struct drm_edid

 .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 78 ++++++++++---------
 .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h |  4 +-
 .../amd/display/amdgpu_dm/amdgpu_dm_helpers.c |  9 ++-
 .../display/amdgpu_dm/amdgpu_dm_mst_types.c   | 23 +++---
 4 files changed, 60 insertions(+), 54 deletions(-)

Comments

Mario Limonciello Jan. 26, 2024, 6:22 p.m. UTC | #1
On 1/26/2024 10:28, Melissa Wen wrote:
> Hi,
> 
> I'm debugging a null-pointer dereference when running
> igt@kms_connector_force_edid and the way I found to solve the bug is to
> stop using raw edid handler in amdgpu_connector_funcs_force and
> create_eml_sink in favor of managing resouces via sruct drm_edid helpers
> (Patch 1). The proper solution seems to be switch amdgpu_dm_connector
> from struct edid to struct drm_edid and avoid the usage of different
> approaches in the driver (Patch 2). However, doing it implies a good
> amount of work and validation, therefore I decided to send this RFC
> first to collect opinions and check if there is any parallel work on
> this side. It's a working in progress.
> 
> The null-pointer error trigger by the igt@kms_connector_force_edid test
> was introduced by:
> - e54ed41620f ("drm/amd/display: Remove unwanted drm edid references")
> 
> You can check the error trace in the first patch.
> 
> This series was tested with kms_hdmi_inject and kms_force_connector. No
> null-pointer error, kms_hdmi_inject is successul and kms_force_connector
> is sucessful after the second execution - the force-edid subtest
> still fails in the first run (I'm still investigating).
> 
> There is also a couple of cast warnings to be addressed - I'm looking
> for the best replacement.
> 
> I appreciate any feedback and testing.

So I'm actually a little bit worried by hardcoding EDID_LENGTH in this 
series.

I have some other patches that I'm posting later on that let you get the 
EDID from _DDC BIOS method too.  My observation was that the EDID can be 
anywhere up to 512 bytes according to the ACPI spec.

An earlier version of my patch was using EDID_LENGTH when fetching it 
and the EDID checksum failed.

I'll CC you on the post, we probably want to get your changes and mine 
merged together.

> 
> Melissa
> 
> Melissa Wen (2):
>    drm/amd/display: fix null-pointer dereference on edid reading
>    drm/amd/display: switch amdgpu_dm_connector to use struct drm_edid
> 
>   .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 78 ++++++++++---------
>   .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h |  4 +-
>   .../amd/display/amdgpu_dm/amdgpu_dm_helpers.c |  9 ++-
>   .../display/amdgpu_dm/amdgpu_dm_mst_types.c   | 23 +++---
>   4 files changed, 60 insertions(+), 54 deletions(-)
>
Jani Nikula Jan. 29, 2024, 9:30 a.m. UTC | #2
On Fri, 26 Jan 2024, Mario Limonciello <mario.limonciello@amd.com> wrote:
> On 1/26/2024 10:28, Melissa Wen wrote:
>> Hi,
>> 
>> I'm debugging a null-pointer dereference when running
>> igt@kms_connector_force_edid and the way I found to solve the bug is to
>> stop using raw edid handler in amdgpu_connector_funcs_force and
>> create_eml_sink in favor of managing resouces via sruct drm_edid helpers
>> (Patch 1). The proper solution seems to be switch amdgpu_dm_connector
>> from struct edid to struct drm_edid and avoid the usage of different
>> approaches in the driver (Patch 2). However, doing it implies a good
>> amount of work and validation, therefore I decided to send this RFC
>> first to collect opinions and check if there is any parallel work on
>> this side. It's a working in progress.
>> 
>> The null-pointer error trigger by the igt@kms_connector_force_edid test
>> was introduced by:
>> - e54ed41620f ("drm/amd/display: Remove unwanted drm edid references")
>> 
>> You can check the error trace in the first patch.
>> 
>> This series was tested with kms_hdmi_inject and kms_force_connector. No
>> null-pointer error, kms_hdmi_inject is successul and kms_force_connector
>> is sucessful after the second execution - the force-edid subtest
>> still fails in the first run (I'm still investigating).
>> 
>> There is also a couple of cast warnings to be addressed - I'm looking
>> for the best replacement.
>> 
>> I appreciate any feedback and testing.
>
> So I'm actually a little bit worried by hardcoding EDID_LENGTH in this 
> series.
>
> I have some other patches that I'm posting later on that let you get the 
> EDID from _DDC BIOS method too.  My observation was that the EDID can be 
> anywhere up to 512 bytes according to the ACPI spec.
>
> An earlier version of my patch was using EDID_LENGTH when fetching it 
> and the EDID checksum failed.
>
> I'll CC you on the post, we probably want to get your changes and mine 
> merged together.

One of the main points of struct drm_edid is that it tracks the
allocation size separately.

We should simply not trust edid->extensions, because most of the time it
originates from outside the kernel.

Using drm_edid and immediately drm_edid_raw() falls short. That function
should only be used during migration to help. And yeah, it also means
EDID parsing should be done in drm_edid.c, and not spread out all over
the subsystem.


BR,
Jani.


>
>> 
>> Melissa
>> 
>> Melissa Wen (2):
>>    drm/amd/display: fix null-pointer dereference on edid reading
>>    drm/amd/display: switch amdgpu_dm_connector to use struct drm_edid
>> 
>>   .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 78 ++++++++++---------
>>   .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h |  4 +-
>>   .../amd/display/amdgpu_dm/amdgpu_dm_helpers.c |  9 ++-
>>   .../display/amdgpu_dm/amdgpu_dm_mst_types.c   | 23 +++---
>>   4 files changed, 60 insertions(+), 54 deletions(-)
>> 
>
Melissa Wen Feb. 5, 2024, 2:27 p.m. UTC | #3
On 01/29, Jani Nikula wrote:
> On Fri, 26 Jan 2024, Mario Limonciello <mario.limonciello@amd.com> wrote:
> > On 1/26/2024 10:28, Melissa Wen wrote:
> >> Hi,
> >> 
> >> I'm debugging a null-pointer dereference when running
> >> igt@kms_connector_force_edid and the way I found to solve the bug is to
> >> stop using raw edid handler in amdgpu_connector_funcs_force and
> >> create_eml_sink in favor of managing resouces via sruct drm_edid helpers
> >> (Patch 1). The proper solution seems to be switch amdgpu_dm_connector
> >> from struct edid to struct drm_edid and avoid the usage of different
> >> approaches in the driver (Patch 2). However, doing it implies a good
> >> amount of work and validation, therefore I decided to send this RFC
> >> first to collect opinions and check if there is any parallel work on
> >> this side. It's a working in progress.
> >> 
> >> The null-pointer error trigger by the igt@kms_connector_force_edid test
> >> was introduced by:
> >> - e54ed41620f ("drm/amd/display: Remove unwanted drm edid references")
> >> 
> >> You can check the error trace in the first patch.
> >> 
> >> This series was tested with kms_hdmi_inject and kms_force_connector. No
> >> null-pointer error, kms_hdmi_inject is successul and kms_force_connector
> >> is sucessful after the second execution - the force-edid subtest
> >> still fails in the first run (I'm still investigating).
> >> 
> >> There is also a couple of cast warnings to be addressed - I'm looking
> >> for the best replacement.
> >> 
> >> I appreciate any feedback and testing.
> >
> > So I'm actually a little bit worried by hardcoding EDID_LENGTH in this 
> > series.
> >
> > I have some other patches that I'm posting later on that let you get the 
> > EDID from _DDC BIOS method too.  My observation was that the EDID can be 
> > anywhere up to 512 bytes according to the ACPI spec.
> >
> > An earlier version of my patch was using EDID_LENGTH when fetching it 
> > and the EDID checksum failed.
> >
> > I'll CC you on the post, we probably want to get your changes and mine 
> > merged together.
> 
> One of the main points of struct drm_edid is that it tracks the
> allocation size separately.
> 
> We should simply not trust edid->extensions, because most of the time it
> originates from outside the kernel.
> 
> Using drm_edid and immediately drm_edid_raw() falls short. That function
> should only be used during migration to help. And yeah, it also means
> EDID parsing should be done in drm_edid.c, and not spread out all over
> the subsystem.

Hi Mario and Jani,

Thanks for the feedback.

I agree with you. I used the drm_edid_raw() as an intermediate step to
assess/validate this migration, but I'll work on removing this hack.

So, I understand that the transition from edid to drm_edid is the right
path, so I'll improve this work (keeping the points you raised in mind)
and send a version.

BR,

Melissa
> 
> 
> BR,
> Jani.
> 
> 
> >
> >> 
> >> Melissa
> >> 
> >> Melissa Wen (2):
> >>    drm/amd/display: fix null-pointer dereference on edid reading
> >>    drm/amd/display: switch amdgpu_dm_connector to use struct drm_edid
> >> 
> >>   .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 78 ++++++++++---------
> >>   .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h |  4 +-
> >>   .../amd/display/amdgpu_dm/amdgpu_dm_helpers.c |  9 ++-
> >>   .../display/amdgpu_dm/amdgpu_dm_mst_types.c   | 23 +++---
> >>   4 files changed, 60 insertions(+), 54 deletions(-)
> >> 
> >
> 
> -- 
> Jani Nikula, Intel