mbox series

[v8,00/13] drm/msm: Capture and dump the GPU crash state

Message ID 20180724163331.18250-1-jcrouse@codeaurora.org (mailing list archive)
Headers show
Series drm/msm: Capture and dump the GPU crash state | expand

Message

Jordan Crouse July 24, 2018, 4:33 p.m. UTC
This is revision 8 implementing a GPU crash state for drm/msm
(https://patchwork.freedesktop.org/series/36097/). This patchset adds better
documentation and reflects comments from the mailing lists. I know we will
miss 4.19 at this point, but I think this is ready to soak in msm-next for
a while.

The object of this code is to store and provide enough information to debug
software and hardware issues on the Adreno hardware in a semi human-readable
format that can also be parsed by scripts.

THe full set of changes here capture basic information about the GPU, the
status and contents of the ringbuffers, a snapshot of the current register state
and the active buffers from the hanging submit.

The data is printed with devcoredump.  For example, after a hang you can get
the data from /sys/class/devcoredump/devcdX/data where X is a unique number.

v8: Add documentation and consolidate puts/printf code from code comments
v7: Add EXPORT_SYMBOL for __drm_puts_coredump and use %zd to print a size_t
variable for the bo dump thanks to the ever vigilant zero one bot.
v6: Add drm_puts() and use it in the appropriate place.  Clean up a few minor
bugs here and there.
v5: Fix symbol error in i915_gpu_error.c thanks to 01 dot org bot. Added
open/release functions for the show debugfs file to get the state per Chris
Wilson. Slightly modified the register output format to be more YAML friendly
also per Chris.
v4: Add buffer dump for the active submit. Fix refcount issue with devcoredump.
Change header for a5xx registers to registers-hlsq because I'm told YAML
requires unique tags.
v3: Make recommended changes to ascii85 per Chris Wilson. Use devcoredump to
dump crash states as suggested by Bjorn Andersson and add a new drm_print
facility to facilitate that. Remove the now obsolete 'crash' debugfs node.
Add documentation for the crash dump output.
v2: Convert output to yaml, use ascii85 to dump ringbuffer contents.

Jordan Crouse (13):
  include: Move ascii85 functions from i915 to linux/ascii85.h
  drm: drm_printer: Add printer for devcoredump
  drm: Add drm_puts() to complement drm_printf()
  drm: Add a -puts() function for the seq_file printer
  drm: Add puts callback for the coredump printer
  drm/msm/gpu: Capture the state of the GPU
  drm/msm/gpu: Convert the GPU show function to use the GPU state
  drm/msm/gpu: Rearrange the code that collects the task during a hang
  drm/msm/gpu: Capture the GPU state on a GPU hang
  drm/msm/adreno: Convert the show/crash file format
  drm/msm/adreno: Add ringbuffer data to the GPU state
  drm/msm/adreno: Add a5xx specific registers for the GPU state
  drm/msm/gpu: Add the buffer objects from the submit to the crash dump

 Documentation/gpu/msm-crash-dump.rst    |  96 ++++++++++
 drivers/gpu/drm/drm_print.c             | 111 +++++++++++
 drivers/gpu/drm/i915/i915_gpu_error.c   |  34 +---
 drivers/gpu/drm/msm/Kconfig             |   1 +
 drivers/gpu/drm/msm/adreno/a3xx_gpu.c   |  30 +--
 drivers/gpu/drm/msm/adreno/a4xx_gpu.c   |  22 ++-
 drivers/gpu/drm/msm/adreno/a5xx_gpu.c   | 242 ++++++++++++++++++++++--
 drivers/gpu/drm/msm/adreno/adreno_gpu.c | 184 ++++++++++++++++--
 drivers/gpu/drm/msm/adreno/adreno_gpu.h |  10 +-
 drivers/gpu/drm/msm/msm_debugfs.c       |  93 ++++++++-
 drivers/gpu/drm/msm/msm_gpu.c           | 145 +++++++++++++-
 drivers/gpu/drm/msm/msm_gpu.h           |  68 ++++++-
 include/drm/drm_print.h                 |  71 +++++++
 include/linux/ascii85.h                 |  38 ++++
 14 files changed, 1044 insertions(+), 101 deletions(-)
 create mode 100644 Documentation/gpu/msm-crash-dump.rst
 create mode 100644 include/linux/ascii85.h