Message ID | 20230621005719.836857-1-andrealmeid@igalia.com (mailing list archive) |
---|---|
Headers | show |
Series | drm: Standardize device reset notification | expand |
Am 21.06.23 um 02:57 schrieb André Almeida: > Hi, > > This is a new version of the documentation for DRM device resets. As I dived > more in the subject, I started to believe that part of the problem was the lack > of a DRM API to get reset information from the driver. With an API, we can > better standardize reset queries, increase common code from both DRM and Mesa, > and make easier to write end-to-end tests. > > So this patchset, along with the documentation, comes with a new IOCTL and two > implementations of it for amdgpu and i915 (although just the former was really > tested). This IOCTL uses the "context id" to query reset information, but this > might be not generic enough to be included in a DRM API. Well the basic problem with that is that we don't have a standard DRM context defined. If you want to do this you should probably start there first. Apart from that this looks like a really really good idea to me, especially that we document the reset expectations. Regards, Christian. > At least for amdgpu, > this information is encapsulated by libdrm so one can't just call the ioctl > directly from the UMD as I was planning to, but a small refactor can be done to > expose the id. Anyway, I'm sharing it as it is to gather feedback if this seems > to work. > > The amdgpu and i915 implementations are provided as a mean of testing and as > exemplification, and not as reference code yet, as the goal is more about the > interface itself then the driver parts. > > For the documentation itself, after spending some time reading the reset path in > the kernel in Mesa, I decide to rewrite it to better reflect how it works, from > bottom to top. > > You can check the userspace side of the IOCLT here: > Mesa: https://gitlab.freedesktop.org/andrealmeid/mesa/-/commit/cd687b22fb32c21b23596c607003e2a495f465 > libdrm: https://gitlab.freedesktop.org/andrealmeid/libdrm/-/commit/b31e5404893ee9a85d1aa67e81c2f58c1dac3c46 > > For testing, I use this vulkan app that has an infinity loop in the shader: > https://github.com/andrealmeid/vulkan-triangle-v1 > > Feedbacks are welcomed! > > Thanks, > André > > v2: https://lore.kernel.org/all/20230227204000.56787-1-andrealmeid@igalia.com/ > v1: https://lore.kernel.org/all/20230123202646.356592-1-andrealmeid@igalia.com/ > > André Almeida (4): > drm/doc: Document DRM device reset expectations > drm: Create DRM_IOCTL_GET_RESET > drm/amdgpu: Implement DRM_IOCTL_GET_RESET > drm/i915: Implement DRM_IOCTL_GET_RESET > > Documentation/gpu/drm-uapi.rst | 51 ++++++++++++++++ > drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 4 +- > drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c | 35 +++++++++++ > drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h | 5 ++ > drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 1 + > drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 12 +++- > drivers/gpu/drm/amd/amdgpu/amdgpu_job.h | 2 + > drivers/gpu/drm/drm_debugfs.c | 2 + > drivers/gpu/drm/drm_ioctl.c | 58 +++++++++++++++++++ > drivers/gpu/drm/i915/gem/i915_gem_context.c | 18 ++++++ > drivers/gpu/drm/i915/gem/i915_gem_context.h | 2 + > .../gpu/drm/i915/gem/i915_gem_context_types.h | 2 + > drivers/gpu/drm/i915/i915_driver.c | 2 + > include/drm/drm_device.h | 3 + > include/drm/drm_drv.h | 3 + > include/uapi/drm/drm.h | 21 +++++++ > include/uapi/drm/drm_mode.h | 15 +++++ > 17 files changed, 233 insertions(+), 3 deletions(-) >
Em 21/06/2023 04:42, Christian König escreveu: > Am 21.06.23 um 02:57 schrieb André Almeida: >> Hi, >> >> This is a new version of the documentation for DRM device resets. As I >> dived >> more in the subject, I started to believe that part of the problem was >> the lack >> of a DRM API to get reset information from the driver. With an API, we >> can >> better standardize reset queries, increase common code from both DRM >> and Mesa, >> and make easier to write end-to-end tests. >> >> So this patchset, along with the documentation, comes with a new IOCTL >> and two >> implementations of it for amdgpu and i915 (although just the former >> was really >> tested). This IOCTL uses the "context id" to query reset information, >> but this >> might be not generic enough to be included in a DRM API. > > Well the basic problem with that is that we don't have a standard DRM > context defined. > > If you want to do this you should probably start there first. Any idea on how to start this? I tried to find previous work about that, but I didn't find. > > Apart from that this looks like a really really good idea to me, > especially that we document the reset expectations. I think I'll submit just the doc for the next version then, given that the IOCTL will need a lot of rework. > > Regards, > Christian. > >> At least for amdgpu, >> this information is encapsulated by libdrm so one can't just call the >> ioctl >> directly from the UMD as I was planning to, but a small refactor can >> be done to >> expose the id. Anyway, I'm sharing it as it is to gather feedback if >> this seems >> to work. >> >> The amdgpu and i915 implementations are provided as a mean of testing >> and as >> exemplification, and not as reference code yet, as the goal is more >> about the >> interface itself then the driver parts. >> >> For the documentation itself, after spending some time reading the >> reset path in >> the kernel in Mesa, I decide to rewrite it to better reflect how it >> works, from >> bottom to top. >> >> You can check the userspace side of the IOCLT here: >> Mesa: >> https://gitlab.freedesktop.org/andrealmeid/mesa/-/commit/cd687b22fb32c21b23596c607003e2a495f465 >> libdrm: >> https://gitlab.freedesktop.org/andrealmeid/libdrm/-/commit/b31e5404893ee9a85d1aa67e81c2f58c1dac3c46 >> >> For testing, I use this vulkan app that has an infinity loop in the >> shader: >> https://github.com/andrealmeid/vulkan-triangle-v1 >> >> Feedbacks are welcomed! >> >> Thanks, >> André >> >> v2: >> https://lore.kernel.org/all/20230227204000.56787-1-andrealmeid@igalia.com/ >> v1: >> https://lore.kernel.org/all/20230123202646.356592-1-andrealmeid@igalia.com/ >> >> André Almeida (4): >> drm/doc: Document DRM device reset expectations >> drm: Create DRM_IOCTL_GET_RESET >> drm/amdgpu: Implement DRM_IOCTL_GET_RESET >> drm/i915: Implement DRM_IOCTL_GET_RESET >> >> Documentation/gpu/drm-uapi.rst | 51 ++++++++++++++++ >> drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 4 +- >> drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c | 35 +++++++++++ >> drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h | 5 ++ >> drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 1 + >> drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 12 +++- >> drivers/gpu/drm/amd/amdgpu/amdgpu_job.h | 2 + >> drivers/gpu/drm/drm_debugfs.c | 2 + >> drivers/gpu/drm/drm_ioctl.c | 58 +++++++++++++++++++ >> drivers/gpu/drm/i915/gem/i915_gem_context.c | 18 ++++++ >> drivers/gpu/drm/i915/gem/i915_gem_context.h | 2 + >> .../gpu/drm/i915/gem/i915_gem_context_types.h | 2 + >> drivers/gpu/drm/i915/i915_driver.c | 2 + >> include/drm/drm_device.h | 3 + >> include/drm/drm_drv.h | 3 + >> include/uapi/drm/drm.h | 21 +++++++ >> include/uapi/drm/drm_mode.h | 15 +++++ >> 17 files changed, 233 insertions(+), 3 deletions(-) >> >
Am 21.06.23 um 17:06 schrieb André Almeida: > Em 21/06/2023 04:42, Christian König escreveu: >> Am 21.06.23 um 02:57 schrieb André Almeida: >>> Hi, >>> >>> This is a new version of the documentation for DRM device resets. As >>> I dived >>> more in the subject, I started to believe that part of the problem >>> was the lack >>> of a DRM API to get reset information from the driver. With an API, >>> we can >>> better standardize reset queries, increase common code from both DRM >>> and Mesa, >>> and make easier to write end-to-end tests. >>> >>> So this patchset, along with the documentation, comes with a new >>> IOCTL and two >>> implementations of it for amdgpu and i915 (although just the former >>> was really >>> tested). This IOCTL uses the "context id" to query reset >>> information, but this >>> might be not generic enough to be included in a DRM API. >> >> Well the basic problem with that is that we don't have a standard DRM >> context defined. >> >> If you want to do this you should probably start there first. > > Any idea on how to start this? I tried to find previous work about > that, but I didn't find. I'm not aware of any work in this area, maybe ping on the Mesa list as well. Could be that someone looked into that but never send anything out. > >> >> Apart from that this looks like a really really good idea to me, >> especially that we document the reset expectations. > > I think I'll submit just the doc for the next version then, given that > the IOCTL will need a lot of rework. Yeah, agree completely. Thanks, Christian. > >> >> Regards, >> Christian. >> >>> At least for amdgpu, >>> this information is encapsulated by libdrm so one can't just call >>> the ioctl >>> directly from the UMD as I was planning to, but a small refactor can >>> be done to >>> expose the id. Anyway, I'm sharing it as it is to gather feedback if >>> this seems >>> to work. >>> >>> The amdgpu and i915 implementations are provided as a mean of >>> testing and as >>> exemplification, and not as reference code yet, as the goal is more >>> about the >>> interface itself then the driver parts. >>> >>> For the documentation itself, after spending some time reading the >>> reset path in >>> the kernel in Mesa, I decide to rewrite it to better reflect how it >>> works, from >>> bottom to top. >>> >>> You can check the userspace side of the IOCLT here: >>> Mesa: >>> https://gitlab.freedesktop.org/andrealmeid/mesa/-/commit/cd687b22fb32c21b23596c607003e2a495f465 >>> libdrm: >>> https://gitlab.freedesktop.org/andrealmeid/libdrm/-/commit/b31e5404893ee9a85d1aa67e81c2f58c1dac3c46 >>> >>> For testing, I use this vulkan app that has an infinity loop in the >>> shader: >>> https://github.com/andrealmeid/vulkan-triangle-v1 >>> >>> Feedbacks are welcomed! >>> >>> Thanks, >>> André >>> >>> v2: >>> https://lore.kernel.org/all/20230227204000.56787-1-andrealmeid@igalia.com/ >>> v1: >>> https://lore.kernel.org/all/20230123202646.356592-1-andrealmeid@igalia.com/ >>> >>> André Almeida (4): >>> drm/doc: Document DRM device reset expectations >>> drm: Create DRM_IOCTL_GET_RESET >>> drm/amdgpu: Implement DRM_IOCTL_GET_RESET >>> drm/i915: Implement DRM_IOCTL_GET_RESET >>> >>> Documentation/gpu/drm-uapi.rst | 51 ++++++++++++++++ >>> drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 4 +- >>> drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c | 35 +++++++++++ >>> drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h | 5 ++ >>> drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 1 + >>> drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 12 +++- >>> drivers/gpu/drm/amd/amdgpu/amdgpu_job.h | 2 + >>> drivers/gpu/drm/drm_debugfs.c | 2 + >>> drivers/gpu/drm/drm_ioctl.c | 58 >>> +++++++++++++++++++ >>> drivers/gpu/drm/i915/gem/i915_gem_context.c | 18 ++++++ >>> drivers/gpu/drm/i915/gem/i915_gem_context.h | 2 + >>> .../gpu/drm/i915/gem/i915_gem_context_types.h | 2 + >>> drivers/gpu/drm/i915/i915_driver.c | 2 + >>> include/drm/drm_device.h | 3 + >>> include/drm/drm_drv.h | 3 + >>> include/uapi/drm/drm.h | 21 +++++++ >>> include/uapi/drm/drm_mode.h | 15 +++++ >>> 17 files changed, 233 insertions(+), 3 deletions(-) >>> >>