mbox series

[v2,0/8] drm/lima: fixes and improvements to error recovery

Message ID 20240124025947.2110659-1-nunes.erico@gmail.com (mailing list archive)
Headers show
Series drm/lima: fixes and improvements to error recovery | expand

Message

Erico Nunes Jan. 24, 2024, 2:59 a.m. UTC
v1 reference:
https://patchwork.kernel.org/project/dri-devel/cover/20240117031212.1104034-1-nunes.erico@gmail.com/

Changes v1 -> v2:
- Dropped patch 1 which aimed to fix
https://gitlab.freedesktop.org/mesa/mesa/-/issues/8415 .
That will require more testing and an actual fix to the irq/timeout
handler race. It can be solved separately so I am deferring it to a
followup patch and keeping that issue open.

- Added patches 2 and 4 to cover "reset time out" and bus stop bit to
hard reset in gp as well.

- Added handling of all processors in synchronize_irq in patch 5 to
cover multiple pp. Dropped unnecessary duplicate fence in patch 5.

- Added patch 7 in v2. After some discussion in patch 4 (v1), it seems
to be reasonable to bump our timeout value so that we further decrease
the chance of users actually hitting any of these timeouts by default.

- Reworked patch 8 in v2. Since I broadened the work to not only focus
in pp anymore, I also included the change to the other blocks as well.

- Collected some reviews and acks in unmodified patches.


Erico Nunes (8):
  drm/lima: reset async_reset on pp hard reset
  drm/lima: reset async_reset on gp hard reset
  drm/lima: set pp bus_stop bit before hard reset
  drm/lima: set gp bus_stop bit before hard reset
  drm/lima: handle spurious timeouts due to high irq latency
  drm/lima: remove guilty drm_sched context handling
  drm/lima: increase default job timeout to 10s
  drm/lima: standardize debug messages by ip name

 drivers/gpu/drm/lima/lima_ctx.c      |  2 +-
 drivers/gpu/drm/lima/lima_ctx.h      |  1 -
 drivers/gpu/drm/lima/lima_gp.c       | 39 +++++++++++++++++++++-------
 drivers/gpu/drm/lima/lima_l2_cache.c |  6 +++--
 drivers/gpu/drm/lima/lima_mmu.c      | 18 ++++++-------
 drivers/gpu/drm/lima/lima_pmu.c      |  3 ++-
 drivers/gpu/drm/lima/lima_pp.c       | 37 ++++++++++++++++++++------
 drivers/gpu/drm/lima/lima_sched.c    | 38 ++++++++++++++++++++++-----
 drivers/gpu/drm/lima/lima_sched.h    |  3 +--
 9 files changed, 107 insertions(+), 40 deletions(-)

Comments

Qiang Yu Jan. 30, 2024, 1:07 a.m. UTC | #1
Serial is Reviewed-by: QIang Yu <yuq825@gmail.com>

On Wed, Jan 24, 2024 at 11:00 AM Erico Nunes <nunes.erico@gmail.com> wrote:
>
> v1 reference:
> https://patchwork.kernel.org/project/dri-devel/cover/20240117031212.1104034-1-nunes.erico@gmail.com/
>
> Changes v1 -> v2:
> - Dropped patch 1 which aimed to fix
> https://gitlab.freedesktop.org/mesa/mesa/-/issues/8415 .
> That will require more testing and an actual fix to the irq/timeout
> handler race. It can be solved separately so I am deferring it to a
> followup patch and keeping that issue open.
>
> - Added patches 2 and 4 to cover "reset time out" and bus stop bit to
> hard reset in gp as well.
>
> - Added handling of all processors in synchronize_irq in patch 5 to
> cover multiple pp. Dropped unnecessary duplicate fence in patch 5.
>
> - Added patch 7 in v2. After some discussion in patch 4 (v1), it seems
> to be reasonable to bump our timeout value so that we further decrease
> the chance of users actually hitting any of these timeouts by default.
>
> - Reworked patch 8 in v2. Since I broadened the work to not only focus
> in pp anymore, I also included the change to the other blocks as well.
>
> - Collected some reviews and acks in unmodified patches.
>
>
> Erico Nunes (8):
>   drm/lima: reset async_reset on pp hard reset
>   drm/lima: reset async_reset on gp hard reset
>   drm/lima: set pp bus_stop bit before hard reset
>   drm/lima: set gp bus_stop bit before hard reset
>   drm/lima: handle spurious timeouts due to high irq latency
>   drm/lima: remove guilty drm_sched context handling
>   drm/lima: increase default job timeout to 10s
>   drm/lima: standardize debug messages by ip name
>
>  drivers/gpu/drm/lima/lima_ctx.c      |  2 +-
>  drivers/gpu/drm/lima/lima_ctx.h      |  1 -
>  drivers/gpu/drm/lima/lima_gp.c       | 39 +++++++++++++++++++++-------
>  drivers/gpu/drm/lima/lima_l2_cache.c |  6 +++--
>  drivers/gpu/drm/lima/lima_mmu.c      | 18 ++++++-------
>  drivers/gpu/drm/lima/lima_pmu.c      |  3 ++-
>  drivers/gpu/drm/lima/lima_pp.c       | 37 ++++++++++++++++++++------
>  drivers/gpu/drm/lima/lima_sched.c    | 38 ++++++++++++++++++++++-----
>  drivers/gpu/drm/lima/lima_sched.h    |  3 +--
>  9 files changed, 107 insertions(+), 40 deletions(-)
>
> --
> 2.43.0
>
Qiang Yu Feb. 12, 2024, 8:40 a.m. UTC | #2
applied to drm-misc-next

On Tue, Jan 30, 2024 at 9:07 AM Qiang Yu <yuq825@gmail.com> wrote:
>
> Serial is Reviewed-by: QIang Yu <yuq825@gmail.com>
>
> On Wed, Jan 24, 2024 at 11:00 AM Erico Nunes <nunes.erico@gmail.com> wrote:
> >
> > v1 reference:
> > https://patchwork.kernel.org/project/dri-devel/cover/20240117031212.1104034-1-nunes.erico@gmail.com/
> >
> > Changes v1 -> v2:
> > - Dropped patch 1 which aimed to fix
> > https://gitlab.freedesktop.org/mesa/mesa/-/issues/8415 .
> > That will require more testing and an actual fix to the irq/timeout
> > handler race. It can be solved separately so I am deferring it to a
> > followup patch and keeping that issue open.
> >
> > - Added patches 2 and 4 to cover "reset time out" and bus stop bit to
> > hard reset in gp as well.
> >
> > - Added handling of all processors in synchronize_irq in patch 5 to
> > cover multiple pp. Dropped unnecessary duplicate fence in patch 5.
> >
> > - Added patch 7 in v2. After some discussion in patch 4 (v1), it seems
> > to be reasonable to bump our timeout value so that we further decrease
> > the chance of users actually hitting any of these timeouts by default.
> >
> > - Reworked patch 8 in v2. Since I broadened the work to not only focus
> > in pp anymore, I also included the change to the other blocks as well.
> >
> > - Collected some reviews and acks in unmodified patches.
> >
> >
> > Erico Nunes (8):
> >   drm/lima: reset async_reset on pp hard reset
> >   drm/lima: reset async_reset on gp hard reset
> >   drm/lima: set pp bus_stop bit before hard reset
> >   drm/lima: set gp bus_stop bit before hard reset
> >   drm/lima: handle spurious timeouts due to high irq latency
> >   drm/lima: remove guilty drm_sched context handling
> >   drm/lima: increase default job timeout to 10s
> >   drm/lima: standardize debug messages by ip name
> >
> >  drivers/gpu/drm/lima/lima_ctx.c      |  2 +-
> >  drivers/gpu/drm/lima/lima_ctx.h      |  1 -
> >  drivers/gpu/drm/lima/lima_gp.c       | 39 +++++++++++++++++++++-------
> >  drivers/gpu/drm/lima/lima_l2_cache.c |  6 +++--
> >  drivers/gpu/drm/lima/lima_mmu.c      | 18 ++++++-------
> >  drivers/gpu/drm/lima/lima_pmu.c      |  3 ++-
> >  drivers/gpu/drm/lima/lima_pp.c       | 37 ++++++++++++++++++++------
> >  drivers/gpu/drm/lima/lima_sched.c    | 38 ++++++++++++++++++++++-----
> >  drivers/gpu/drm/lima/lima_sched.h    |  3 +--
> >  9 files changed, 107 insertions(+), 40 deletions(-)
> >
> > --
> > 2.43.0
> >