Message ID | 20240126163429.56714-1-mwen@igalia.com (mailing list archive) |
---|---|
Headers | show |
Series | drm/amd/display: switch amdgpu_dm_connector to | expand |
On 1/26/2024 10:28, Melissa Wen wrote: > Hi, > > I'm debugging a null-pointer dereference when running > igt@kms_connector_force_edid and the way I found to solve the bug is to > stop using raw edid handler in amdgpu_connector_funcs_force and > create_eml_sink in favor of managing resouces via sruct drm_edid helpers > (Patch 1). The proper solution seems to be switch amdgpu_dm_connector > from struct edid to struct drm_edid and avoid the usage of different > approaches in the driver (Patch 2). However, doing it implies a good > amount of work and validation, therefore I decided to send this RFC > first to collect opinions and check if there is any parallel work on > this side. It's a working in progress. > > The null-pointer error trigger by the igt@kms_connector_force_edid test > was introduced by: > - e54ed41620f ("drm/amd/display: Remove unwanted drm edid references") > > You can check the error trace in the first patch. > > This series was tested with kms_hdmi_inject and kms_force_connector. No > null-pointer error, kms_hdmi_inject is successul and kms_force_connector > is sucessful after the second execution - the force-edid subtest > still fails in the first run (I'm still investigating). > > There is also a couple of cast warnings to be addressed - I'm looking > for the best replacement. > > I appreciate any feedback and testing. So I'm actually a little bit worried by hardcoding EDID_LENGTH in this series. I have some other patches that I'm posting later on that let you get the EDID from _DDC BIOS method too. My observation was that the EDID can be anywhere up to 512 bytes according to the ACPI spec. An earlier version of my patch was using EDID_LENGTH when fetching it and the EDID checksum failed. I'll CC you on the post, we probably want to get your changes and mine merged together. > > Melissa > > Melissa Wen (2): > drm/amd/display: fix null-pointer dereference on edid reading > drm/amd/display: switch amdgpu_dm_connector to use struct drm_edid > > .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 78 ++++++++++--------- > .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h | 4 +- > .../amd/display/amdgpu_dm/amdgpu_dm_helpers.c | 9 ++- > .../display/amdgpu_dm/amdgpu_dm_mst_types.c | 23 +++--- > 4 files changed, 60 insertions(+), 54 deletions(-) >
On Fri, 26 Jan 2024, Mario Limonciello <mario.limonciello@amd.com> wrote: > On 1/26/2024 10:28, Melissa Wen wrote: >> Hi, >> >> I'm debugging a null-pointer dereference when running >> igt@kms_connector_force_edid and the way I found to solve the bug is to >> stop using raw edid handler in amdgpu_connector_funcs_force and >> create_eml_sink in favor of managing resouces via sruct drm_edid helpers >> (Patch 1). The proper solution seems to be switch amdgpu_dm_connector >> from struct edid to struct drm_edid and avoid the usage of different >> approaches in the driver (Patch 2). However, doing it implies a good >> amount of work and validation, therefore I decided to send this RFC >> first to collect opinions and check if there is any parallel work on >> this side. It's a working in progress. >> >> The null-pointer error trigger by the igt@kms_connector_force_edid test >> was introduced by: >> - e54ed41620f ("drm/amd/display: Remove unwanted drm edid references") >> >> You can check the error trace in the first patch. >> >> This series was tested with kms_hdmi_inject and kms_force_connector. No >> null-pointer error, kms_hdmi_inject is successul and kms_force_connector >> is sucessful after the second execution - the force-edid subtest >> still fails in the first run (I'm still investigating). >> >> There is also a couple of cast warnings to be addressed - I'm looking >> for the best replacement. >> >> I appreciate any feedback and testing. > > So I'm actually a little bit worried by hardcoding EDID_LENGTH in this > series. > > I have some other patches that I'm posting later on that let you get the > EDID from _DDC BIOS method too. My observation was that the EDID can be > anywhere up to 512 bytes according to the ACPI spec. > > An earlier version of my patch was using EDID_LENGTH when fetching it > and the EDID checksum failed. > > I'll CC you on the post, we probably want to get your changes and mine > merged together. One of the main points of struct drm_edid is that it tracks the allocation size separately. We should simply not trust edid->extensions, because most of the time it originates from outside the kernel. Using drm_edid and immediately drm_edid_raw() falls short. That function should only be used during migration to help. And yeah, it also means EDID parsing should be done in drm_edid.c, and not spread out all over the subsystem. BR, Jani. > >> >> Melissa >> >> Melissa Wen (2): >> drm/amd/display: fix null-pointer dereference on edid reading >> drm/amd/display: switch amdgpu_dm_connector to use struct drm_edid >> >> .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 78 ++++++++++--------- >> .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h | 4 +- >> .../amd/display/amdgpu_dm/amdgpu_dm_helpers.c | 9 ++- >> .../display/amdgpu_dm/amdgpu_dm_mst_types.c | 23 +++--- >> 4 files changed, 60 insertions(+), 54 deletions(-) >> >
On 01/29, Jani Nikula wrote: > On Fri, 26 Jan 2024, Mario Limonciello <mario.limonciello@amd.com> wrote: > > On 1/26/2024 10:28, Melissa Wen wrote: > >> Hi, > >> > >> I'm debugging a null-pointer dereference when running > >> igt@kms_connector_force_edid and the way I found to solve the bug is to > >> stop using raw edid handler in amdgpu_connector_funcs_force and > >> create_eml_sink in favor of managing resouces via sruct drm_edid helpers > >> (Patch 1). The proper solution seems to be switch amdgpu_dm_connector > >> from struct edid to struct drm_edid and avoid the usage of different > >> approaches in the driver (Patch 2). However, doing it implies a good > >> amount of work and validation, therefore I decided to send this RFC > >> first to collect opinions and check if there is any parallel work on > >> this side. It's a working in progress. > >> > >> The null-pointer error trigger by the igt@kms_connector_force_edid test > >> was introduced by: > >> - e54ed41620f ("drm/amd/display: Remove unwanted drm edid references") > >> > >> You can check the error trace in the first patch. > >> > >> This series was tested with kms_hdmi_inject and kms_force_connector. No > >> null-pointer error, kms_hdmi_inject is successul and kms_force_connector > >> is sucessful after the second execution - the force-edid subtest > >> still fails in the first run (I'm still investigating). > >> > >> There is also a couple of cast warnings to be addressed - I'm looking > >> for the best replacement. > >> > >> I appreciate any feedback and testing. > > > > So I'm actually a little bit worried by hardcoding EDID_LENGTH in this > > series. > > > > I have some other patches that I'm posting later on that let you get the > > EDID from _DDC BIOS method too. My observation was that the EDID can be > > anywhere up to 512 bytes according to the ACPI spec. > > > > An earlier version of my patch was using EDID_LENGTH when fetching it > > and the EDID checksum failed. > > > > I'll CC you on the post, we probably want to get your changes and mine > > merged together. > > One of the main points of struct drm_edid is that it tracks the > allocation size separately. > > We should simply not trust edid->extensions, because most of the time it > originates from outside the kernel. > > Using drm_edid and immediately drm_edid_raw() falls short. That function > should only be used during migration to help. And yeah, it also means > EDID parsing should be done in drm_edid.c, and not spread out all over > the subsystem. Hi Mario and Jani, Thanks for the feedback. I agree with you. I used the drm_edid_raw() as an intermediate step to assess/validate this migration, but I'll work on removing this hack. So, I understand that the transition from edid to drm_edid is the right path, so I'll improve this work (keeping the points you raised in mind) and send a version. BR, Melissa > > > BR, > Jani. > > > > > >> > >> Melissa > >> > >> Melissa Wen (2): > >> drm/amd/display: fix null-pointer dereference on edid reading > >> drm/amd/display: switch amdgpu_dm_connector to use struct drm_edid > >> > >> .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 78 ++++++++++--------- > >> .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h | 4 +- > >> .../amd/display/amdgpu_dm/amdgpu_dm_helpers.c | 9 ++- > >> .../display/amdgpu_dm/amdgpu_dm_mst_types.c | 23 +++--- > >> 4 files changed, 60 insertions(+), 54 deletions(-) > >> > > > > -- > Jani Nikula, Intel