Message ID | 20240529-b4-media_docs_improve-v1-2-9ddb111c4433@collabora.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | Documentation: Debugging guide | expand |
Hi Sebastian, This is a good idea, but it needs more work. The main change is that it needs to be split into two parts: debugging driver issues while developing the driver, and debugging userspace issues where you want to figure out why (typically) an ioctl fails. On 29/05/2024 18:22, Sebastian Fricke wrote: > Create a guides section for all documentation material, that isn't > strictly related to a specific piece of code. > > Provide a guide for developers on how to debug code with a focus on the > media subsystem. This document aims to provide a rough overview over the > possibilities and a rational to help choosing the right tool for the > given circumstances. > > Signed-off-by: Sebastian Fricke <sebastian.fricke@collabora.com> > --- > Documentation/media/guides/debugging_issues.rst | 477 ++++++++++++++++++++++++ > Documentation/media/guides/index.rst | 11 + > Documentation/media/index.rst | 1 + > 3 files changed, 489 insertions(+) > > diff --git a/Documentation/media/guides/debugging_issues.rst b/Documentation/media/guides/debugging_issues.rst > new file mode 100644 > index 000000000000..8fee749e7e6a > --- /dev/null > +++ b/Documentation/media/guides/debugging_issues.rst > @@ -0,0 +1,477 @@ > +.. SPDX-License-Identifier: GPL-2.0 > +.. include:: <isonum.txt> > + > +=================================================== > +Debugging and tracing in the media subsystem > +=================================================== > + > +This document serves as a starting point and lookup for debugging device > +drivers in the media subsystem. > + > +.. contents:: > + :depth: 3 > + > +General debugging advice > +======================== > + > +Depending on the issue, a different set of tools is available to track down the > +problem or even to realize whether there is one in the first place. > + > +As a first step you have to figure out what kind of issue you want to debug. > +Depending on the answer, your methodology and choice of tools may vary. > +Some typical issues are listed below. The list is not exhaustive. > + > +What is the issue? > +------------------ > +1. Driver doesn't behave as expected > + > + TBD > + > +2. Running out of memory > + > + In this case you should use `Linux CLI tools <linux-tools_>`_ first to verify > + whether the issue stems from the Kernel or from userspace. > + > + TBD > + > +3. Kernel panic > + > + The kernel provides a number of `tools <kernel_panic_analysis_tools_>`_ to > + analyse a Kernel panic, use these first, once you have identified where the > + error occurs you can further analyse the issue by adding `debug logs <Printk > + & friends_>`_ to the codebase. > + > +4. Device doesn't probe > + > + TBD > + > +5. Driver too slow > + > + You should start with a `performance analysis <performance>`__, to pin down the problem area. I'm not sure how useful this section is. Too vague, really. > + > +Is timing a factor? > +------------------- > + > +It is important to understand if the problem you want to debug manifests itself > +consistently (i.e. given a set of inputs you always get the same, incorrect > +output), or inconsistently. If it manifests itself inconsistently, some timing > +factor might be at play. If inserting delays into the code does change the > +behavior, then quite likely timing is a factor. > + > + In this case using a simple `printk`_ won't work, a similar alternative is > + too use `trace_printk`_, which logs the debug messages to the trace file too -> to > + instead of the kernel log. > + > +Do I have root access to the system? > +------------------------------------ > + > +Are you easily able to replace the module in question or to install a new kernel? > + > +TBD > + > +.. _live_debug: > + > +Do I need to debug on a live system? > +------------------------------------ > + > +When the kernel is compiled with `Ftrace`_, this tool might be a good start, as you can trace specific functions to dial down to the sequence of actions that causes the misbehavior. > + > +TBD > + > +Is the outcome flaky? > +--------------------- > + > +We will call the outcome "flaky" if multiple iterations of the same routine with the same parameters result in different outcomes. > + > +TBD > + > +Available tools > +=============== > + > +Printk & friends > +---------------- > + > +These are derivatives of printf() with varying destination and support for being dynamically turned on or off, or lack thereof. > + > +.. _printk: > + > +**Simple printk** > +~~~~~~~~~~~~~~~~~ > + > +The classic, this can be used to great effect for quick and dirty development > +of new modules or to extract arbitrary necessary data for troubleshooting. > + > +Prerequisite: `CONFIG_PRINTK` (usually enabled by default) > + > +**Pros**: > + > +- No need to learn anything, simple to use > +- Easy to modify exactly to your needs (formatting of the data (See: `format specifiers <../../core-api/printk-formats.html#printk-specifiers>`__), visibility in the log) > +- Can cause delays in the execution of the code (beneficial to confirm whether timing is a factor) > + > +**Cons**: > + > +- Requires rebuilding the kernel/module, (See: `live_debug`_) > +- Can cause delays in the execution of the code > + > +`Full documentation <../../core-api/printk-basics.rst>`__ > + > +.. _trace_printk: > + > +**Trace_printk** > +~~~~~~~~~~~~~~~~ > + > +Prerequisite: `CONFIG_DYNAMIC_FTRACE` & `#include <linux/ftrace.h>` > + > +A tiny bit less comfortable to use than `printk`_, because you will have to > +read the messages from the trace file (See: `Reading the ftrace log`_ instead > +of from the kernel log, but very useful when printk adds unwanted delays into > +the code execution, causing issues to be flaky or hidden.) > + > +If the processing of this still causes timing issues then you can try `trace_puts()`. > + > +`Full Documentation <../../driver-api/basics.html#c.trace_printk>`__ > + > +**dev_dbg / v4l2_dbg** > +~~~~~~~~~~~~~~~~~~~~~~ > + > +- Difference between both? > + > + - v4l2_dbg utilizes v4l2_printk under the hood, which further uses printk directly, thus it cannot be targeted by dynamic debug > + - dev_dbg can be targeted by dynamic debug > + - v4l2_dbg has a more specific prefix format for the media subsystem, while dev_dbg only highlights the driver name and the location of the log > + > +**The dev_debug module parameter** This needs to be more prominent: it is very useful, esp. when debugging userspace problems. It is also always available, and all you need is root access in order to write to dev_debug. This is also mixed in with the various printk sections, but this should be a separate section. > + > +Every video device provides a `dev_debug` parameter, which allows to get further insights into the IOCTLs in the background. > + > +`Full documentation <../../driver-api/media/v4l2-dev.html#video-device-debugging>`__ > + > +**When is it appropriate to leave a debug print in the code?** > + > +Permanent debug statements have to be useful for a developer to troubleshoot > +why the driver misbehaves. Judging that is a bit more of an art than a > +science, but some guidelines are in the `Coding style guide > +<../../process/coding-style.html#printing-kernel-messages>`__) > + > +**Custom printk** > +~~~~~~~~~~~~~~~~~ > +Example: > +:: > + > + #define core_dbg(fmt, arg...) do { \ > + if (core_debug) \ > + printk(KERN_DEBUG pr_fmt("core: " fmt), ## arg); \ > + } while (0) > + > +**When should you do this?** > + > +It is better to just use a `pr_debug()`, which can later be turned on/off with > +dynamic debug. Additionally, a lot of drivers activate these prints via a > +variable like `core_debug` set by a module parameter. However, Module > +parameters `are not recommended anymore > +<https://lkml.org/lkml/2024/3/27/163>`_. > + > +Dynamic debug > +------------- > + > +Mechanism to filter what ends up in the kernel log by dis-/en-abling log > +messages. > + > +Prerequisite: `CONFIG_DYNAMIC_DEBUG` > + > +.. _valid_dyndbg_prints: > + > +Dynamic debug is only able to target: > + > +- `pr_debug()` > +- `dev_dbg()` > +- `print_hex_dump_debug()` > +- `print_hex_dump_bytes()` > + > +Therefore the usability of this tool is quite limited in the media subsystem, > +because, as of now, there is no uniform rule for adding debug prints to the codebase, > +resulting in a variety of ways these prints are implemented. > + > +Also, note that most debug statements are implemented as a variation of > +`dprintk`, which have to be activated via a parameter in respective module, > +dynamic debug is unable to do that step for you. > + > +Here is one example, that enables all available `pr_debug()`'s within the file: > +:: > + > + $ alias ddcmd='echo $* > /proc/dynamic_debug/control' > + $ ddcmd '-p; file v4l2-h264.c +p' > + $ grep =p /proc/dynamic_debug/control > + drivers/media/v4l2-core/v4l2-h264.c:372 [v4l2_h264]print_ref_list_b =p "ref_pic_list_b%u (cur_poc %u%c) %s" > + drivers/media/v4l2-core/v4l2-h264.c:333 [v4l2_h264]print_ref_list_p =p "ref_pic_list_p (cur_poc %u%c) %s\n" > + > +**When should you use this over** `Ftrace`_ **?** > + > +- When the code contains one of the :ref:`valid print statements <valid_dyndbg_prints_>`_ or when you have added multiple pr_debug() statements during development > +- When timing is not an issue, meaning if multiple `pr_debug()` statements in the code won't cause delays > +- When you care more about receiving specific log messages than tracing the pattern of how a function is called > + > +`Full documentation <../../admin-guide/dynamic-debug-howto.rst>`__ printk and friends is definitely for driver devs as you need to be able to recompile the kernel. > + > +Ftrace > +------ > + > +Prerequisite: `CONFIG_DYNAMIC_FTRACE` > + > +Trace whenever the `rkvdec_try_ctrl` function is called > +:: > + > + $ cd /sys/kernel/tracing > + $ echo function > /sys/kernel/tracing/current_tracer > + $ echo rkvdec_try_ctrl > set_ftrace_filter > + $ echo 1 > tracing_on > + $ cat trace > + h264parse0:sink-6359 [001] ...1. 172714.547523: rkvdec_try_ctrl <-try_or_set_cluster > + h264parse0:sink-6359 [005] ...1. 172714.567386: rkvdec_try_ctrl <-try_or_set_cluster > + > +Find out from where the calls originate > +:: > + > + $ echo 1 > options/func_stack_trace > + h264parse0:sink-6715 [002] ..... 172837.967762: rkvdec_try_ctrl <-try_or_set_cluster > + h264parse0:sink-6715 [002] ..... 172837.967773: <stack trace> > + => rkvdec_try_ctrl > + => try_or_set_cluster > + => try_set_ext_ctrls_common > + => try_set_ext_ctrls > + => v4l2_s_ext_ctrls > + => v4l_s_ext_ctrls > + ... > + h264parse0:sink-6715 [004] ..... 172837.985747: rkvdec_try_ctrl <-try_or_set_cluster > + h264parse0:sink-6715 [004] ..... 172837.985750: <stack trace> > + => rkvdec_try_ctrl > + => try_or_set_cluster > + => v4l2_ctrl_request_setup > + => rkvdec_run_preamble > + => rkvdec_h264_run > + => rkvdec_device_run > + ... > + > +Trace the children of a function call and show the return values (requires config `FUNCTION_GRAPH_RETVAL`) > +:: > + > + echo function_graph > current_tracer > + echo rkvdec_h264_run > set_graph_function > + echo 4 > max_graph_depth > + echo do_interrupt_handler mutex_* > set_graph_notrace > + echo 1 > options/funcgraph-retval > + ... > + 4) | rkvdec_h264_run [rockchip_vdec]() { > + 4) | v4l2_ctrl_find [videodev]() { > + ... > + 4) | rkvdec_run_preamble [rockchip_vdec]() { > + 4) 4.666 us | v4l2_m2m_next_buf [v4l2_mem2mem](); /* = 0xffff000005782000 */ > + ... > + 4) | v4l2_ctrl_request_setup [videodev]() { > + 4) 4.667 us | media_request_object_find [mc](); /* = 0xffff000005e3aa98 */ > + 4) 1.750 us | find_ref [videodev](); /* = 0xffff00000833b2a0 */ > + ... > + 4) 1.750 us | v4l2_m2m_buf_copy_metadata [v4l2_mem2mem](); /* = 0x0 */ > + 4) ! 114.333 us | } /* rkvdec_run_preamble [rockchip_vdec] = 0x0 */ > + 4) 2.334 us | v4l2_h264_init_reflist_builder [v4l2_h264](); /* = 0x3e */ > + ... > + 4) | v4l2_h264_build_b_ref_lists [v4l2_h264]() { > + ... > + 4) | rkvdec_run_postamble [rockchip_vdec]() { > + ... > + 4) ! 444.208 us | } /* rkvdec_h264_run [rockchip_vdec] = 0x0 */ > + > +**Reading the ftrace log** > +~~~~~~~~~~~~~~~~~~~~~~~~~~ > + > +The `trace` file can be read just like any other file (`cat`, `tail`, `head`, > +`vim`, etc.), the size of the file is limited by the `buffer_size_kb` (``echo > +1000 > buffer_size_kb``). The `trace_pipe` will behave similar to the `trace` > +file, but whenever you read from the file the content is consumed. > + > +**Kernelshark** > +~~~~~~~~~~~~~~~ > + > +`Kernelshark documentation <https://kernelshark.org/Documentation.html>`_ > + > +`Full documentation <../../trace/ftrace.rst>`__ > + > +DebugFS > +------- > + > +Prerequisite: `CONFIG_DEBUG_FS` & `#include <linux/debugfs.h>` > + > +DebugFS differs from the other approaches of debugging, as it doesn't write messages to the kernel log nor add traces to the code. Instead it allows the developer to handle a set of files. > +With these files you can either store values of variables or make register/memory dumps or you can make these files writable and modify values/settings in the driver. > +Possible use-cases among others: > + > +- Store register values > +- Keep track of variables > +- Store errors > +- Store settings > +- Toggle a setting like debug on/off > +- Error injection > + > +This is especially useful, when the size of a data dump would be hard to digest as > +part of the general kernel log (for example when dumping raw bitstream data) or > +when you are not interested in all the values all the time, but with the > +possibility to inspect them. > + > +The general idea is: > + > +- Create a directory during probe (`struct dentry *parent = debugfs_create_dir("my_driver", NULL);`) > +- Create a file (`debugfs_create_u32("my_value", 444, parent, &my_variable);`) > + > + - In this example the file is found in `/sys/kernel/debug/my_driver/my_value` (with read permissions for user/group/all) > + - any update of `my_variable` will update the value in the file > + > +- Clean up the folder when removing the device (`debugfs_remove_recursive(parent);`) > + > +`Full documentation <../../filesystems/debugfs.rst>`__ > + > +Perf & alternatives > +------------------- > + > +The tools mentioned above provide ways to inspect kernel code, results, variable values, etc. > +Sometimes you have to find out first where to look and for those cases a box of performance tracking tools can help you to frame the issue. > + > +.. _performance: > + > +**Why should you do a performance analysis?** > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > + > +A performance analysis is a good first step when among other reasons: > + > +- you cannot define the issue > +- you do not know where it occurs > +- the running system should not be interrupted or it is a remote system, where you cannot install a new module/kernel > + > +.. _linux-tools: > + > +**How to do a simple analysis with linux tools?** > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > + > +For the start of a performance analysis, you can start with the usual tools like: > + > +- `top` / `htop` / `atop` (*get an overview of the system load, see spikes on specific processes*) > +- `mpstat -P ALL` (*look at the load distribution among CPUs*) > +- `iostat -x` (*observe input and output devices utilization and performance*) > +- `vmstat` (*overview of memory usage on the system*) > +- `pidstat` (*similar to* `vmstat` *but per process, to dial it down to the target*) > +- `strace -tp $PID` (*once you know the process, you can figure out how it communicates with the Kernel*) > + > +These should help to figure out restrict the areas to look at sufficiently. > + > +**Diving deeper with perf** > +~~~~~~~~~~~~~~~~~~~~~~~~~~~ > + > +The **perf** tool provides a series of metrics and events to further dial down on issues. > + > +Prerequisite: TBD > + > +Gather statistics data for a decoding job: (This example is on a RK3399 SoC with the rkvdec codec driver using the `fluster test suite <https://github.com/fluendo/fluster>`__) > +:: > + > + perf stat -d python3 fluster.py run -d GStreamer-H.264-V4L2SL-Gst1.0 -ts JVT-AVC_V1 -tv AUD_MW_E -j1 > + ... > + Performance counter stats for 'python3 fluster.py run -d GStreamer-H.264-V4L2SL-Gst1.0 -ts JVT-AVC_V1 -tv AUD_MW_E -j1 -v': > + > + 7794.23 msec task-clock:u # 0.697 CPUs utilized > + 0 context-switches:u # 0.000 /sec > + 0 cpu-migrations:u # 0.000 /sec > + 11901 page-faults:u # 1.527 K/sec > + 882671556 cycles:u # 0.113 GHz (95.79%) > + 711708695 instructions:u # 0.81 insn per cycle (95.79%) > + 10581935 branches:u # 1.358 M/sec (15.13%) > + 6871144 branch-misses:u # 64.93% of all branches (95.79%) > + 281716547 L1-dcache-loads:u # 36.144 M/sec (95.79%) > + 9019581 L1-dcache-load-misses:u # 3.20% of all L1-dcache accesses (95.79%) > + <not supported> LLC-loads:u > + <not supported> LLC-load-misses:u > + > + 11.180830431 seconds time elapsed > + > + 1.502318000 seconds user > + 6.377221000 seconds sys > + > +The availability of events and metrics depends on the system you are running. > + > +`Full documentation <https://perf.wiki.kernel.org/index.php/Main_Page>`__ > + > +**Perfetto** > +~~~~~~~~~~~~ > + > +TBD > + > +.. _error_checking: > + > +KASAN, UBSAN, lockdep and other error checkers > +---------------------------------------------- > + > +TBD > + > +.. _kernel_panic_analysis_tools: > + > +Kernel panic analysis tools > +--------------------------- I think this section is worthwhile to make more prominent as well. > + > + To analyse the crash dump please use `Kdump` & `Kexec`. > + > + TBD > + > + `Full documentation <../../admin-guide/kdump/kdump.rst>`__ > + > + In order to find the corresponding line in the code you can use `faddr2line > + <https://elixir.bootlin.com/linux/latest/source/scripts/faddr2line>`__, note > + that you need to enable `CONFIG_DEBUG_INFO` for that to work. > + > + An alternative to using `faddr2line` is the use of `objdump` (and it's > + derivatives for the different platforms like `aarch64-linux-gnu-objdump`), > + take this line as an example: > + > + `[ +0.000240] rkvdec_device_run+0x50/0x138 [rockchip_vdec]`. > + > + We can find the corresponding line of code by executing: > + :: > + > + aarch64-linux-gnu-objdump -dS drivers/staging/media/rkvdec/rockchip-vdec.ko | grep rkvdec_device_run\>: -A 40 > + 0000000000000ac8 <rkvdec_device_run>: > + ac8: d503201f nop > + acc: d503201f nop > + { > + ad0: d503233f paciasp > + ad4: a9bd7bfd stp x29, x30, [sp, #-48]! > + ad8: 910003fd mov x29, sp > + adc: a90153f3 stp x19, x20, [sp, #16] > + ae0: a9025bf5 stp x21, x22, [sp, #32] > + const struct rkvdec_coded_fmt_desc *desc = ctx->coded_fmt_desc; > + ae4: f9411814 ldr x20, [x0, #560] > + struct rkvdec_dev *rkvdec = ctx->dev; > + ae8: f9418015 ldr x21, [x0, #768] > + if (WARN_ON(!desc)) > + aec: b4000654 cbz x20, bb4 <rkvdec_device_run+0xec> > + ret = pm_runtime_resume_and_get(rkvdec->dev); > + af0: f943d2b6 ldr x22, [x21, #1952] > + ret = __pm_runtime_resume(dev, RPM_GET_PUT); > + af4: aa0003f3 mov x19, x0 > + af8: 52800081 mov w1, #0x4 // #4 > + afc: aa1603e0 mov x0, x22 > + b00: 94000000 bl 0 <__pm_runtime_resume> > + if (ret < 0) { > + b04: 37f80340 tbnz w0, #31, b6c <rkvdec_device_run+0xa4> > + dev_warn(rkvdec->dev, "Not good\n"); > + b08: f943d2a0 ldr x0, [x21, #1952] > + b0c: 90000001 adrp x1, 0 <rkvdec_try_ctrl-0x8> > + b10: 91000021 add x1, x1, #0x0 > + b14: 94000000 bl 0 <_dev_warn> > + *bad = 1; > + b18: d2800001 mov x1, #0x0 // #0 > + ... > + > + > + To find the matching line we just have to add `0x50` (from > + `rkvdec_device_run+0x50`) to `0xac8` (from `0000000000000ac8 > + <rkvdec_device_run>:`), which yields `0xb18` corresponding with `*bad = 1`. > + > +**Copyright** |copy| 2024 : Collabora > diff --git a/Documentation/media/guides/index.rst b/Documentation/media/guides/index.rst > new file mode 100644 > index 000000000000..0008966c0862 > --- /dev/null > +++ b/Documentation/media/guides/index.rst > @@ -0,0 +1,11 @@ > +.. SPDX-License-Identifier: GPL-2.0 > + > +============ > +Media Guides > +============ > + > +.. toctree:: > + :caption: Table of Contents > + :maxdepth: 1 > + > + debugging_issues > diff --git a/Documentation/media/index.rst b/Documentation/media/index.rst > index d056a9e99dca..5461876fc401 100644 > --- a/Documentation/media/index.rst > +++ b/Documentation/media/index.rst > @@ -7,6 +7,7 @@ Media Subsystem Documentation > .. toctree:: > :maxdepth: 2 > > + guides/index > ../userspace-api/media/index > ../driver-api/media/index.rst > ../admin-guide/media/index > Regards, Hans
Hi Sebastian, On 29.05.2024 18:22, Sebastian Fricke wrote: > Create a guides section for all documentation material, that isn't > strictly related to a specific piece of code. > > Provide a guide for developers on how to debug code with a focus on the > media subsystem. This document aims to provide a rough overview over the > possibilities and a rational to help choosing the right tool for the > given circumstances. Just like you, I'm not an English native speaker myself, so sometimes I make style mistakes that only seem obvious to someone whose native tongue is English. What I do in the case of documentation patches is running the whole thing through languagetool (https://www.languagetool.org). It's LGPL and I suppose some vim versions might have a plugin that lets you interface with it from within the buffer you're editing. On my side, when running it on the previous paragraph, it showed the following warning: [ADVISE_VBG[3] premium: false] The verb ‘help’ is used with an infinitive. -> (to choose, choose) > Signed-off-by: Sebastian Fricke <sebastian.fricke@collabora.com> > --- > Documentation/media/guides/debugging_issues.rst | 477 ++++++++++++++++++++++++ > Documentation/media/guides/index.rst | 11 + > Documentation/media/index.rst | 1 + > 3 files changed, 489 insertions(+) > > diff --git a/Documentation/media/guides/debugging_issues.rst b/Documentation/media/guides/debugging_issues.rst > new file mode 100644 > index 000000000000..8fee749e7e6a > --- /dev/null > +++ b/Documentation/media/guides/debugging_issues.rst > @@ -0,0 +1,477 @@ > +.. SPDX-License-Identifier: GPL-2.0 I've noticed you've included the licence header in all your files. I also noticed though that Documentation/gpu/index.rst doesn't have it. I suppose it might've become a requiremente after that file was first submitted, but I'm not sure so I guess it's best to ask you directly. > +.. include:: <isonum.txt> > + > +=================================================== > +Debugging and tracing in the media subsystem > +=================================================== > + > +This document serves as a starting point and lookup for debugging device > +drivers in the media subsystem. > + > +.. contents:: > + :depth: 3 > + > +General debugging advice > +======================== > + > +Depending on the issue, a different set of tools is available to track down the > +problem or even to realize whether there is one in the first place. > + > +As a first step you have to figure out what kind of issue you want to debug. > +Depending on the answer, your methodology and choice of tools may vary. > +Some typical issues are listed below. The list is not exhaustive. I've noticed a lot of the information in this guide isn't specific to the Media subsystem. Isn't there a chance that you might be reduplicating this information? I guess a lot of it could fit into some sort of 'general driver debugging techniques' file, although I'm not sure where that would fit into the documentation hierarchy. > +What is the issue? > +------------------ > +1. Driver doesn't behave as expected > + > + TBD > + > +2. Running out of memory > + > + In this case you should use `Linux CLI tools <linux-tools_>`_ first to verify > + whether the issue stems from the Kernel or from userspace. > + > + TBD > + > +3. Kernel panic > + > + The kernel provides a number of `tools <kernel_panic_analysis_tools_>`_ to > + analyse a Kernel panic, use these first, once you have identified where the > + error occurs you can further analyse the issue by adding `debug logs <Printk > + & friends_>`_ to the codebase. > + > +4. Device doesn't probe > + > + TBD > + > +5. Driver too slow > + > + You should start with a `performance analysis <performance>`__, to pin down the problem area. > + > +Is timing a factor? > +------------------- > + > +It is important to understand if the problem you want to debug manifests itself > +consistently (i.e. given a set of inputs you always get the same, incorrect > +output), or inconsistently. If it manifests itself inconsistently, some timing > +factor might be at play. If inserting delays into the code does change the > +behavior, then quite likely timing is a factor. > + > + In this case using a simple `printk`_ won't work, a similar alternative is > + too use `trace_printk`_, which logs the debug messages to the trace file > + instead of the kernel log. > + > +Do I have root access to the system? > +------------------------------------ > + > +Are you easily able to replace the module in question or to install a new kernel? > + > +TBD > + > +.. _live_debug: > + > +Do I need to debug on a live system? > +------------------------------------ > + > +When the kernel is compiled with `Ftrace`_, this tool might be a good start, as you can trace specific functions to dial down to the sequence of actions that causes the misbehavior. > + > +TBD > + > +Is the outcome flaky? > +--------------------- > + > +We will call the outcome "flaky" if multiple iterations of the same routine with the same parameters result in different outcomes. > + > +TBD > + > +Available tools > +=============== > + > +Printk & friends > +---------------- > + > +These are derivatives of printf() with varying destination and support for being dynamically turned on or off, or lack thereof. > + > +.. _printk: > + > +**Simple printk** > +~~~~~~~~~~~~~~~~~ > + > +The classic, this can be used to great effect for quick and dirty development > +of new modules or to extract arbitrary necessary data for troubleshooting. > + > +Prerequisite: `CONFIG_PRINTK` (usually enabled by default) > + > +**Pros**: > + > +- No need to learn anything, simple to use > +- Easy to modify exactly to your needs (formatting of the data (See: `format specifiers <../../core-api/printk-formats.html#printk-specifiers>`__), visibility in the log) > +- Can cause delays in the execution of the code (beneficial to confirm whether timing is a factor) > + > +**Cons**: > + > +- Requires rebuilding the kernel/module, (See: `live_debug`_) > +- Can cause delays in the execution of the code > + > +`Full documentation <../../core-api/printk-basics.rst>`__ > + > +.. _trace_printk: > + > +**Trace_printk** > +~~~~~~~~~~~~~~~~ > + > +Prerequisite: `CONFIG_DYNAMIC_FTRACE` & `#include <linux/ftrace.h>` > + > +A tiny bit less comfortable to use than `printk`_, because you will have to > +read the messages from the trace file (See: `Reading the ftrace log`_ instead > +of from the kernel log, but very useful when printk adds unwanted delays into > +the code execution, causing issues to be flaky or hidden.) > + > +If the processing of this still causes timing issues then you can try `trace_puts()`. > + > +`Full Documentation <../../driver-api/basics.html#c.trace_printk>`__ > + > +**dev_dbg / v4l2_dbg** > +~~~~~~~~~~~~~~~~~~~~~~ > + > +- Difference between both? > + > + - v4l2_dbg utilizes v4l2_printk under the hood, which further uses printk directly, thus it cannot be targeted by dynamic debug > + - dev_dbg can be targeted by dynamic debug > + - v4l2_dbg has a more specific prefix format for the media subsystem, while dev_dbg only highlights the driver name and the location of the log > + > +**The dev_debug module parameter** > + > +Every video device provides a `dev_debug` parameter, which allows to get further insights into the IOCTLs in the background. > + > +`Full documentation <../../driver-api/media/v4l2-dev.html#video-device-debugging>`__ > + > +**When is it appropriate to leave a debug print in the code?** > + > +Permanent debug statements have to be useful for a developer to troubleshoot > +why the driver misbehaves. Judging that is a bit more of an art than a > +science, but some guidelines are in the `Coding style guide > +<../../process/coding-style.html#printing-kernel-messages>`__) > + > +**Custom printk** > +~~~~~~~~~~~~~~~~~ > +Example: > +:: > + > + #define core_dbg(fmt, arg...) do { \ > + if (core_debug) \ > + printk(KERN_DEBUG pr_fmt("core: " fmt), ## arg); \ > + } while (0) > + > +**When should you do this?** > + > +It is better to just use a `pr_debug()`, which can later be turned on/off with > +dynamic debug. Additionally, a lot of drivers activate these prints via a > +variable like `core_debug` set by a module parameter. However, Module > +parameters `are not recommended anymore > +<https://lkml.org/lkml/2024/3/27/163>`_. > + > +Dynamic debug > +------------- > + > +Mechanism to filter what ends up in the kernel log by dis-/en-abling log > +messages. > + > +Prerequisite: `CONFIG_DYNAMIC_DEBUG` > + > +.. _valid_dyndbg_prints: > + > +Dynamic debug is only able to target: > + > +- `pr_debug()` > +- `dev_dbg()` > +- `print_hex_dump_debug()` > +- `print_hex_dump_bytes()` > + > +Therefore the usability of this tool is quite limited in the media subsystem, > +because, as of now, there is no uniform rule for adding debug prints to the codebase, > +resulting in a variety of ways these prints are implemented. > + > +Also, note that most debug statements are implemented as a variation of > +`dprintk`, which have to be activated via a parameter in respective module, > +dynamic debug is unable to do that step for you. > + > +Here is one example, that enables all available `pr_debug()`'s within the file: > +:: > + > + $ alias ddcmd='echo $* > /proc/dynamic_debug/control' > + $ ddcmd '-p; file v4l2-h264.c +p' > + $ grep =p /proc/dynamic_debug/control > + drivers/media/v4l2-core/v4l2-h264.c:372 [v4l2_h264]print_ref_list_b =p "ref_pic_list_b%u (cur_poc %u%c) %s" > + drivers/media/v4l2-core/v4l2-h264.c:333 [v4l2_h264]print_ref_list_p =p "ref_pic_list_p (cur_poc %u%c) %s\n" > + > +**When should you use this over** `Ftrace`_ **?** > + > +- When the code contains one of the :ref:`valid print statements <valid_dyndbg_prints_>`_ or when you have added multiple pr_debug() statements during development > +- When timing is not an issue, meaning if multiple `pr_debug()` statements in the code won't cause delays > +- When you care more about receiving specific log messages than tracing the pattern of how a function is called > + > +`Full documentation <../../admin-guide/dynamic-debug-howto.rst>`__ > + > +Ftrace > +------ > + > +Prerequisite: `CONFIG_DYNAMIC_FTRACE` > + > +Trace whenever the `rkvdec_try_ctrl` function is called > +:: > + > + $ cd /sys/kernel/tracing > + $ echo function > /sys/kernel/tracing/current_tracer > + $ echo rkvdec_try_ctrl > set_ftrace_filter > + $ echo 1 > tracing_on > + $ cat trace > + h264parse0:sink-6359 [001] ...1. 172714.547523: rkvdec_try_ctrl <-try_or_set_cluster > + h264parse0:sink-6359 [005] ...1. 172714.567386: rkvdec_try_ctrl <-try_or_set_cluster > + > +Find out from where the calls originate > +:: > + > + $ echo 1 > options/func_stack_trace > + h264parse0:sink-6715 [002] ..... 172837.967762: rkvdec_try_ctrl <-try_or_set_cluster > + h264parse0:sink-6715 [002] ..... 172837.967773: <stack trace> > + => rkvdec_try_ctrl > + => try_or_set_cluster > + => try_set_ext_ctrls_common > + => try_set_ext_ctrls > + => v4l2_s_ext_ctrls > + => v4l_s_ext_ctrls > + ... > + h264parse0:sink-6715 [004] ..... 172837.985747: rkvdec_try_ctrl <-try_or_set_cluster > + h264parse0:sink-6715 [004] ..... 172837.985750: <stack trace> > + => rkvdec_try_ctrl > + => try_or_set_cluster > + => v4l2_ctrl_request_setup > + => rkvdec_run_preamble > + => rkvdec_h264_run > + => rkvdec_device_run > + ... > + > +Trace the children of a function call and show the return values (requires config `FUNCTION_GRAPH_RETVAL`) > +:: > + > + echo function_graph > current_tracer > + echo rkvdec_h264_run > set_graph_function > + echo 4 > max_graph_depth > + echo do_interrupt_handler mutex_* > set_graph_notrace > + echo 1 > options/funcgraph-retval > + ... > + 4) | rkvdec_h264_run [rockchip_vdec]() { > + 4) | v4l2_ctrl_find [videodev]() { > + ... > + 4) | rkvdec_run_preamble [rockchip_vdec]() { > + 4) 4.666 us | v4l2_m2m_next_buf [v4l2_mem2mem](); /* = 0xffff000005782000 */ > + ... > + 4) | v4l2_ctrl_request_setup [videodev]() { > + 4) 4.667 us | media_request_object_find [mc](); /* = 0xffff000005e3aa98 */ > + 4) 1.750 us | find_ref [videodev](); /* = 0xffff00000833b2a0 */ > + ... > + 4) 1.750 us | v4l2_m2m_buf_copy_metadata [v4l2_mem2mem](); /* = 0x0 */ > + 4) ! 114.333 us | } /* rkvdec_run_preamble [rockchip_vdec] = 0x0 */ > + 4) 2.334 us | v4l2_h264_init_reflist_builder [v4l2_h264](); /* = 0x3e */ > + ... > + 4) | v4l2_h264_build_b_ref_lists [v4l2_h264]() { > + ... > + 4) | rkvdec_run_postamble [rockchip_vdec]() { > + ... > + 4) ! 444.208 us | } /* rkvdec_h264_run [rockchip_vdec] = 0x0 */ > + > +**Reading the ftrace log** > +~~~~~~~~~~~~~~~~~~~~~~~~~~ > + > +The `trace` file can be read just like any other file (`cat`, `tail`, `head`, > +`vim`, etc.), the size of the file is limited by the `buffer_size_kb` (``echo > +1000 > buffer_size_kb``). The `trace_pipe` will behave similar to the `trace` > +file, but whenever you read from the file the content is consumed. > + > +**Kernelshark** > +~~~~~~~~~~~~~~~ > + > +`Kernelshark documentation <https://kernelshark.org/Documentation.html>`_ > + > +`Full documentation <../../trace/ftrace.rst>`__ > + > +DebugFS > +------- > + > +Prerequisite: `CONFIG_DEBUG_FS` & `#include <linux/debugfs.h>` > + > +DebugFS differs from the other approaches of debugging, as it doesn't write messages to the kernel log nor add traces to the code. Instead it allows the developer to handle a set of files. > +With these files you can either store values of variables or make register/memory dumps or you can make these files writable and modify values/settings in the driver. > +Possible use-cases among others: > + > +- Store register values > +- Keep track of variables > +- Store errors > +- Store settings > +- Toggle a setting like debug on/off > +- Error injection > + > +This is especially useful, when the size of a data dump would be hard to digest as > +part of the general kernel log (for example when dumping raw bitstream data) or > +when you are not interested in all the values all the time, but with the > +possibility to inspect them. > + > +The general idea is: > + > +- Create a directory during probe (`struct dentry *parent = debugfs_create_dir("my_driver", NULL);`) > +- Create a file (`debugfs_create_u32("my_value", 444, parent, &my_variable);`) > + > + - In this example the file is found in `/sys/kernel/debug/my_driver/my_value` (with read permissions for user/group/all) > + - any update of `my_variable` will update the value in the file > + > +- Clean up the folder when removing the device (`debugfs_remove_recursive(parent);`) > + > +`Full documentation <../../filesystems/debugfs.rst>`__ > + > +Perf & alternatives > +------------------- > + > +The tools mentioned above provide ways to inspect kernel code, results, variable values, etc. > +Sometimes you have to find out first where to look and for those cases a box of performance tracking tools can help you to frame the issue. > + > +.. _performance: > + > +**Why should you do a performance analysis?** > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > + > +A performance analysis is a good first step when among other reasons: > + > +- you cannot define the issue > +- you do not know where it occurs > +- the running system should not be interrupted or it is a remote system, where you cannot install a new module/kernel > + > +.. _linux-tools: > + > +**How to do a simple analysis with linux tools?** > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > + > +For the start of a performance analysis, you can start with the usual tools like: > + > +- `top` / `htop` / `atop` (*get an overview of the system load, see spikes on specific processes*) > +- `mpstat -P ALL` (*look at the load distribution among CPUs*) > +- `iostat -x` (*observe input and output devices utilization and performance*) > +- `vmstat` (*overview of memory usage on the system*) > +- `pidstat` (*similar to* `vmstat` *but per process, to dial it down to the target*) > +- `strace -tp $PID` (*once you know the process, you can figure out how it communicates with the Kernel*) > + > +These should help to figure out restrict the areas to look at sufficiently. > + > +**Diving deeper with perf** > +~~~~~~~~~~~~~~~~~~~~~~~~~~~ > + > +The **perf** tool provides a series of metrics and events to further dial down on issues. > + > +Prerequisite: TBD > + > +Gather statistics data for a decoding job: (This example is on a RK3399 SoC with the rkvdec codec driver using the `fluster test suite <https://github.com/fluendo/fluster>`__) > +:: > + > + perf stat -d python3 fluster.py run -d GStreamer-H.264-V4L2SL-Gst1.0 -ts JVT-AVC_V1 -tv AUD_MW_E -j1 > + ... > + Performance counter stats for 'python3 fluster.py run -d GStreamer-H.264-V4L2SL-Gst1.0 -ts JVT-AVC_V1 -tv AUD_MW_E -j1 -v': > + > + 7794.23 msec task-clock:u # 0.697 CPUs utilized > + 0 context-switches:u # 0.000 /sec > + 0 cpu-migrations:u # 0.000 /sec > + 11901 page-faults:u # 1.527 K/sec > + 882671556 cycles:u # 0.113 GHz (95.79%) > + 711708695 instructions:u # 0.81 insn per cycle (95.79%) > + 10581935 branches:u # 1.358 M/sec (15.13%) > + 6871144 branch-misses:u # 64.93% of all branches (95.79%) > + 281716547 L1-dcache-loads:u # 36.144 M/sec (95.79%) > + 9019581 L1-dcache-load-misses:u # 3.20% of all L1-dcache accesses (95.79%) > + <not supported> LLC-loads:u > + <not supported> LLC-load-misses:u > + > + 11.180830431 seconds time elapsed > + > + 1.502318000 seconds user > + 6.377221000 seconds sys > + > +The availability of events and metrics depends on the system you are running. > + > +`Full documentation <https://perf.wiki.kernel.org/index.php/Main_Page>`__ > + > +**Perfetto** > +~~~~~~~~~~~~ > + > +TBD > + > +.. _error_checking: > + > +KASAN, UBSAN, lockdep and other error checkers > +---------------------------------------------- > + > +TBD > + > +.. _kernel_panic_analysis_tools: > + > +Kernel panic analysis tools > +--------------------------- > + > + To analyse the crash dump please use `Kdump` & `Kexec`. > + > + TBD > + > + `Full documentation <../../admin-guide/kdump/kdump.rst>`__ > + > + In order to find the corresponding line in the code you can use `faddr2line > + <https://elixir.bootlin.com/linux/latest/source/scripts/faddr2line>`__, note > + that you need to enable `CONFIG_DEBUG_INFO` for that to work. > + > + An alternative to using `faddr2line` is the use of `objdump` (and it's > + derivatives for the different platforms like `aarch64-linux-gnu-objdump`), > + take this line as an example: > + > + `[ +0.000240] rkvdec_device_run+0x50/0x138 [rockchip_vdec]`. > + > + We can find the corresponding line of code by executing: > + :: > + > + aarch64-linux-gnu-objdump -dS drivers/staging/media/rkvdec/rockchip-vdec.ko | grep rkvdec_device_run\>: -A 40 > + 0000000000000ac8 <rkvdec_device_run>: > + ac8: d503201f nop > + acc: d503201f nop > + { > + ad0: d503233f paciasp > + ad4: a9bd7bfd stp x29, x30, [sp, #-48]! > + ad8: 910003fd mov x29, sp > + adc: a90153f3 stp x19, x20, [sp, #16] > + ae0: a9025bf5 stp x21, x22, [sp, #32] > + const struct rkvdec_coded_fmt_desc *desc = ctx->coded_fmt_desc; > + ae4: f9411814 ldr x20, [x0, #560] > + struct rkvdec_dev *rkvdec = ctx->dev; > + ae8: f9418015 ldr x21, [x0, #768] > + if (WARN_ON(!desc)) > + aec: b4000654 cbz x20, bb4 <rkvdec_device_run+0xec> > + ret = pm_runtime_resume_and_get(rkvdec->dev); > + af0: f943d2b6 ldr x22, [x21, #1952] > + ret = __pm_runtime_resume(dev, RPM_GET_PUT); > + af4: aa0003f3 mov x19, x0 > + af8: 52800081 mov w1, #0x4 // #4 > + afc: aa1603e0 mov x0, x22 > + b00: 94000000 bl 0 <__pm_runtime_resume> > + if (ret < 0) { > + b04: 37f80340 tbnz w0, #31, b6c <rkvdec_device_run+0xa4> > + dev_warn(rkvdec->dev, "Not good\n"); > + b08: f943d2a0 ldr x0, [x21, #1952] > + b0c: 90000001 adrp x1, 0 <rkvdec_try_ctrl-0x8> > + b10: 91000021 add x1, x1, #0x0 > + b14: 94000000 bl 0 <_dev_warn> > + *bad = 1; > + b18: d2800001 mov x1, #0x0 // #0 > + ... > + > + > + To find the matching line we just have to add `0x50` (from > + `rkvdec_device_run+0x50`) to `0xac8` (from `0000000000000ac8 > + <rkvdec_device_run>:`), which yields `0xb18` corresponding with `*bad = 1`. > + > +**Copyright** |copy| 2024 : Collabora > diff --git a/Documentation/media/guides/index.rst b/Documentation/media/guides/index.rst > new file mode 100644 > index 000000000000..0008966c0862 > --- /dev/null > +++ b/Documentation/media/guides/index.rst > @@ -0,0 +1,11 @@ > +.. SPDX-License-Identifier: GPL-2.0 > + > +============ > +Media Guides > +============ > + > +.. toctree:: > + :caption: Table of Contents > + :maxdepth: 1 > + > + debugging_issues > diff --git a/Documentation/media/index.rst b/Documentation/media/index.rst > index d056a9e99dca..5461876fc401 100644 > --- a/Documentation/media/index.rst > +++ b/Documentation/media/index.rst > @@ -7,6 +7,7 @@ Media Subsystem Documentation > .. toctree:: > :maxdepth: 2 > > + guides/index > ../userspace-api/media/index > ../driver-api/media/index.rst > ../admin-guide/media/index > > -- > 2.25.1 Adrian Larumbe
Hi, On 6/28/24 12:08 PM, Adrian Larumbe wrote: > On 29.05.2024 18:22, Sebastian Fricke wrote: >> Create a guides section for all documentation material, that isn't >> strictly related to a specific piece of code. >> >> Provide a guide for developers on how to debug code with a focus on the >> media subsystem. This document aims to provide a rough overview over the >> possibilities and a rational to help choosing the right tool for the >> given circumstances. > Just like you, I'm not an English native speaker myself, so sometimes I make > style mistakes that only seem obvious to someone whose native tongue is English. > > What I do in the case of documentation patches is running the whole thing > through languagetool (https://www.languagetool.org). It's LGPL and I suppose > some vim versions might have a plugin that lets you interface with it from > within the buffer you're editing. On my side, when running it on the previous > paragraph, it showed the following warning: > > [ADVISE_VBG[3] premium: false] The verb ‘help’ is used with an infinitive. -> (to choose, choose) Ack that one. Also in that same sentence, s/rational/rationale/.
On Mon, Jun 24, 2024 at 01:23:10PM +0200, Hans Verkuil wrote: > Hi Sebastian, > > This is a good idea, but it needs more work. > > The main change is that it needs to be split into two parts: debugging > driver issues while developing the driver, and debugging userspace issues > where you want to figure out why (typically) an ioctl fails. I agree. For the userspace issues, I wonder where the right place to document debugging tips is. Centralizing them in a single location has upsides, but there are also upsides in splitting that information and storing it alongside the relevant guides. For instance, I wrote some time ago a detailed guide on how to configure media pipelines (see https://git.ideasonboard.org/doc/mc-v4l2.git/tree/). Getting a -EPIPE error when starting the pipeline is one of the most commonly encountered issue (https://git.ideasonboard.org/doc/mc-v4l2.git/tree/configuring-pipelines.rst#n237), and the guide explains how to solve by walking the user through pipeline configuration theory, as well as pointing to the relevant debugging tools. > On 29/05/2024 18:22, Sebastian Fricke wrote: > > Create a guides section for all documentation material, that isn't > > strictly related to a specific piece of code. > > > > Provide a guide for developers on how to debug code with a focus on the > > media subsystem. This document aims to provide a rough overview over the > > possibilities and a rational to help choosing the right tool for the > > given circumstances. > > > > Signed-off-by: Sebastian Fricke <sebastian.fricke@collabora.com> > > --- > > Documentation/media/guides/debugging_issues.rst | 477 ++++++++++++++++++++++++ > > Documentation/media/guides/index.rst | 11 + > > Documentation/media/index.rst | 1 + > > 3 files changed, 489 insertions(+) > > > > diff --git a/Documentation/media/guides/debugging_issues.rst b/Documentation/media/guides/debugging_issues.rst > > new file mode 100644 > > index 000000000000..8fee749e7e6a > > --- /dev/null > > +++ b/Documentation/media/guides/debugging_issues.rst > > @@ -0,0 +1,477 @@ > > +.. SPDX-License-Identifier: GPL-2.0 > > +.. include:: <isonum.txt> > > + > > +=================================================== > > +Debugging and tracing in the media subsystem > > +=================================================== > > + > > +This document serves as a starting point and lookup for debugging device > > +drivers in the media subsystem. > > + > > +.. contents:: > > + :depth: 3 > > + > > +General debugging advice > > +======================== > > + > > +Depending on the issue, a different set of tools is available to track down the > > +problem or even to realize whether there is one in the first place. > > + > > +As a first step you have to figure out what kind of issue you want to debug. > > +Depending on the answer, your methodology and choice of tools may vary. > > +Some typical issues are listed below. The list is not exhaustive. > > + > > +What is the issue? > > +------------------ > > +1. Driver doesn't behave as expected > > + > > + TBD > > + > > +2. Running out of memory > > + > > + In this case you should use `Linux CLI tools <linux-tools_>`_ first to verify > > + whether the issue stems from the Kernel or from userspace. > > + > > + TBD > > + > > +3. Kernel panic > > + > > + The kernel provides a number of `tools <kernel_panic_analysis_tools_>`_ to > > + analyse a Kernel panic, use these first, once you have identified where the > > + error occurs you can further analyse the issue by adding `debug logs <Printk > > + & friends_>`_ to the codebase. > > + > > +4. Device doesn't probe > > + > > + TBD > > + > > +5. Driver too slow > > + > > + You should start with a `performance analysis <performance>`__, to pin down the problem area. > > I'm not sure how useful this section is. Too vague, really. > > > + > > +Is timing a factor? > > +------------------- > > + > > +It is important to understand if the problem you want to debug manifests itself > > +consistently (i.e. given a set of inputs you always get the same, incorrect > > +output), or inconsistently. If it manifests itself inconsistently, some timing > > +factor might be at play. If inserting delays into the code does change the > > +behavior, then quite likely timing is a factor. > > + > > + In this case using a simple `printk`_ won't work, a similar alternative is > > + too use `trace_printk`_, which logs the debug messages to the trace file > > too -> to > > > + instead of the kernel log. > > + > > +Do I have root access to the system? > > +------------------------------------ > > + > > +Are you easily able to replace the module in question or to install a new kernel? > > + > > +TBD > > + > > +.. _live_debug: > > + > > +Do I need to debug on a live system? > > +------------------------------------ > > + > > +When the kernel is compiled with `Ftrace`_, this tool might be a good start, as you can trace specific functions to dial down to the sequence of actions that causes the misbehavior. > > + > > +TBD > > + > > +Is the outcome flaky? > > +--------------------- > > + > > +We will call the outcome "flaky" if multiple iterations of the same routine with the same parameters result in different outcomes. > > + > > +TBD > > + > > +Available tools > > +=============== > > + > > +Printk & friends > > +---------------- > > + > > +These are derivatives of printf() with varying destination and support for being dynamically turned on or off, or lack thereof. > > + > > +.. _printk: > > + > > +**Simple printk** > > +~~~~~~~~~~~~~~~~~ > > + > > +The classic, this can be used to great effect for quick and dirty development > > +of new modules or to extract arbitrary necessary data for troubleshooting. > > + > > +Prerequisite: `CONFIG_PRINTK` (usually enabled by default) > > + > > +**Pros**: > > + > > +- No need to learn anything, simple to use > > +- Easy to modify exactly to your needs (formatting of the data (See: `format specifiers <../../core-api/printk-formats.html#printk-specifiers>`__), visibility in the log) > > +- Can cause delays in the execution of the code (beneficial to confirm whether timing is a factor) > > + > > +**Cons**: > > + > > +- Requires rebuilding the kernel/module, (See: `live_debug`_) > > +- Can cause delays in the execution of the code > > + > > +`Full documentation <../../core-api/printk-basics.rst>`__ > > + > > +.. _trace_printk: > > + > > +**Trace_printk** > > +~~~~~~~~~~~~~~~~ > > + > > +Prerequisite: `CONFIG_DYNAMIC_FTRACE` & `#include <linux/ftrace.h>` > > + > > +A tiny bit less comfortable to use than `printk`_, because you will have to > > +read the messages from the trace file (See: `Reading the ftrace log`_ instead > > +of from the kernel log, but very useful when printk adds unwanted delays into > > +the code execution, causing issues to be flaky or hidden.) > > + > > +If the processing of this still causes timing issues then you can try `trace_puts()`. > > + > > +`Full Documentation <../../driver-api/basics.html#c.trace_printk>`__ > > + > > +**dev_dbg / v4l2_dbg** > > +~~~~~~~~~~~~~~~~~~~~~~ > > + > > +- Difference between both? > > + > > + - v4l2_dbg utilizes v4l2_printk under the hood, which further uses printk directly, thus it cannot be targeted by dynamic debug > > + - dev_dbg can be targeted by dynamic debug > > + - v4l2_dbg has a more specific prefix format for the media subsystem, while dev_dbg only highlights the driver name and the location of the log > > + > > +**The dev_debug module parameter** > > This needs to be more prominent: it is very useful, esp. when debugging userspace problems. > > It is also always available, and all you need is root access in order to write to dev_debug. > > This is also mixed in with the various printk sections, but this should be a separate > section. > > > + > > +Every video device provides a `dev_debug` parameter, which allows to get further insights into the IOCTLs in the background. > > + > > +`Full documentation <../../driver-api/media/v4l2-dev.html#video-device-debugging>`__ > > + > > +**When is it appropriate to leave a debug print in the code?** > > + > > +Permanent debug statements have to be useful for a developer to troubleshoot > > +why the driver misbehaves. Judging that is a bit more of an art than a > > +science, but some guidelines are in the `Coding style guide > > +<../../process/coding-style.html#printing-kernel-messages>`__) > > + > > +**Custom printk** > > +~~~~~~~~~~~~~~~~~ > > +Example: > > +:: > > + > > + #define core_dbg(fmt, arg...) do { \ > > + if (core_debug) \ > > + printk(KERN_DEBUG pr_fmt("core: " fmt), ## arg); \ > > + } while (0) > > + > > +**When should you do this?** > > + > > +It is better to just use a `pr_debug()`, which can later be turned on/off with > > +dynamic debug. Additionally, a lot of drivers activate these prints via a > > +variable like `core_debug` set by a module parameter. However, Module > > +parameters `are not recommended anymore > > +<https://lkml.org/lkml/2024/3/27/163>`_. > > + > > +Dynamic debug > > +------------- > > + > > +Mechanism to filter what ends up in the kernel log by dis-/en-abling log > > +messages. > > + > > +Prerequisite: `CONFIG_DYNAMIC_DEBUG` > > + > > +.. _valid_dyndbg_prints: > > + > > +Dynamic debug is only able to target: > > + > > +- `pr_debug()` > > +- `dev_dbg()` > > +- `print_hex_dump_debug()` > > +- `print_hex_dump_bytes()` > > + > > +Therefore the usability of this tool is quite limited in the media subsystem, > > +because, as of now, there is no uniform rule for adding debug prints to the codebase, > > +resulting in a variety of ways these prints are implemented. > > + > > +Also, note that most debug statements are implemented as a variation of > > +`dprintk`, which have to be activated via a parameter in respective module, > > +dynamic debug is unable to do that step for you. > > + > > +Here is one example, that enables all available `pr_debug()`'s within the file: > > +:: > > + > > + $ alias ddcmd='echo $* > /proc/dynamic_debug/control' > > + $ ddcmd '-p; file v4l2-h264.c +p' > > + $ grep =p /proc/dynamic_debug/control > > + drivers/media/v4l2-core/v4l2-h264.c:372 [v4l2_h264]print_ref_list_b =p "ref_pic_list_b%u (cur_poc %u%c) %s" > > + drivers/media/v4l2-core/v4l2-h264.c:333 [v4l2_h264]print_ref_list_p =p "ref_pic_list_p (cur_poc %u%c) %s\n" > > + > > +**When should you use this over** `Ftrace`_ **?** > > + > > +- When the code contains one of the :ref:`valid print statements <valid_dyndbg_prints_>`_ or when you have added multiple pr_debug() statements during development > > +- When timing is not an issue, meaning if multiple `pr_debug()` statements in the code won't cause delays > > +- When you care more about receiving specific log messages than tracing the pattern of how a function is called > > + > > +`Full documentation <../../admin-guide/dynamic-debug-howto.rst>`__ > > printk and friends is definitely for driver devs as you need to be able to recompile the kernel. > > > + > > +Ftrace > > +------ > > + > > +Prerequisite: `CONFIG_DYNAMIC_FTRACE` > > + > > +Trace whenever the `rkvdec_try_ctrl` function is called > > +:: > > + > > + $ cd /sys/kernel/tracing > > + $ echo function > /sys/kernel/tracing/current_tracer > > + $ echo rkvdec_try_ctrl > set_ftrace_filter > > + $ echo 1 > tracing_on > > + $ cat trace > > + h264parse0:sink-6359 [001] ...1. 172714.547523: rkvdec_try_ctrl <-try_or_set_cluster > > + h264parse0:sink-6359 [005] ...1. 172714.567386: rkvdec_try_ctrl <-try_or_set_cluster > > + > > +Find out from where the calls originate > > +:: > > + > > + $ echo 1 > options/func_stack_trace > > + h264parse0:sink-6715 [002] ..... 172837.967762: rkvdec_try_ctrl <-try_or_set_cluster > > + h264parse0:sink-6715 [002] ..... 172837.967773: <stack trace> > > + => rkvdec_try_ctrl > > + => try_or_set_cluster > > + => try_set_ext_ctrls_common > > + => try_set_ext_ctrls > > + => v4l2_s_ext_ctrls > > + => v4l_s_ext_ctrls > > + ... > > + h264parse0:sink-6715 [004] ..... 172837.985747: rkvdec_try_ctrl <-try_or_set_cluster > > + h264parse0:sink-6715 [004] ..... 172837.985750: <stack trace> > > + => rkvdec_try_ctrl > > + => try_or_set_cluster > > + => v4l2_ctrl_request_setup > > + => rkvdec_run_preamble > > + => rkvdec_h264_run > > + => rkvdec_device_run > > + ... > > + > > +Trace the children of a function call and show the return values (requires config `FUNCTION_GRAPH_RETVAL`) > > +:: > > + > > + echo function_graph > current_tracer > > + echo rkvdec_h264_run > set_graph_function > > + echo 4 > max_graph_depth > > + echo do_interrupt_handler mutex_* > set_graph_notrace > > + echo 1 > options/funcgraph-retval > > + ... > > + 4) | rkvdec_h264_run [rockchip_vdec]() { > > + 4) | v4l2_ctrl_find [videodev]() { > > + ... > > + 4) | rkvdec_run_preamble [rockchip_vdec]() { > > + 4) 4.666 us | v4l2_m2m_next_buf [v4l2_mem2mem](); /* = 0xffff000005782000 */ > > + ... > > + 4) | v4l2_ctrl_request_setup [videodev]() { > > + 4) 4.667 us | media_request_object_find [mc](); /* = 0xffff000005e3aa98 */ > > + 4) 1.750 us | find_ref [videodev](); /* = 0xffff00000833b2a0 */ > > + ... > > + 4) 1.750 us | v4l2_m2m_buf_copy_metadata [v4l2_mem2mem](); /* = 0x0 */ > > + 4) ! 114.333 us | } /* rkvdec_run_preamble [rockchip_vdec] = 0x0 */ > > + 4) 2.334 us | v4l2_h264_init_reflist_builder [v4l2_h264](); /* = 0x3e */ > > + ... > > + 4) | v4l2_h264_build_b_ref_lists [v4l2_h264]() { > > + ... > > + 4) | rkvdec_run_postamble [rockchip_vdec]() { > > + ... > > + 4) ! 444.208 us | } /* rkvdec_h264_run [rockchip_vdec] = 0x0 */ > > + > > +**Reading the ftrace log** > > +~~~~~~~~~~~~~~~~~~~~~~~~~~ > > + > > +The `trace` file can be read just like any other file (`cat`, `tail`, `head`, > > +`vim`, etc.), the size of the file is limited by the `buffer_size_kb` (``echo > > +1000 > buffer_size_kb``). The `trace_pipe` will behave similar to the `trace` > > +file, but whenever you read from the file the content is consumed. > > + > > +**Kernelshark** > > +~~~~~~~~~~~~~~~ > > + > > +`Kernelshark documentation <https://kernelshark.org/Documentation.html>`_ > > + > > +`Full documentation <../../trace/ftrace.rst>`__ > > + > > +DebugFS > > +------- > > + > > +Prerequisite: `CONFIG_DEBUG_FS` & `#include <linux/debugfs.h>` > > + > > +DebugFS differs from the other approaches of debugging, as it doesn't write messages to the kernel log nor add traces to the code. Instead it allows the developer to handle a set of files. > > +With these files you can either store values of variables or make register/memory dumps or you can make these files writable and modify values/settings in the driver. > > +Possible use-cases among others: > > + > > +- Store register values > > +- Keep track of variables > > +- Store errors > > +- Store settings > > +- Toggle a setting like debug on/off > > +- Error injection > > + > > +This is especially useful, when the size of a data dump would be hard to digest as > > +part of the general kernel log (for example when dumping raw bitstream data) or > > +when you are not interested in all the values all the time, but with the > > +possibility to inspect them. > > + > > +The general idea is: > > + > > +- Create a directory during probe (`struct dentry *parent = debugfs_create_dir("my_driver", NULL);`) > > +- Create a file (`debugfs_create_u32("my_value", 444, parent, &my_variable);`) > > + > > + - In this example the file is found in `/sys/kernel/debug/my_driver/my_value` (with read permissions for user/group/all) > > + - any update of `my_variable` will update the value in the file > > + > > +- Clean up the folder when removing the device (`debugfs_remove_recursive(parent);`) > > + > > +`Full documentation <../../filesystems/debugfs.rst>`__ > > + > > +Perf & alternatives > > +------------------- > > + > > +The tools mentioned above provide ways to inspect kernel code, results, variable values, etc. > > +Sometimes you have to find out first where to look and for those cases a box of performance tracking tools can help you to frame the issue. > > + > > +.. _performance: > > + > > +**Why should you do a performance analysis?** > > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > + > > +A performance analysis is a good first step when among other reasons: > > + > > +- you cannot define the issue > > +- you do not know where it occurs > > +- the running system should not be interrupted or it is a remote system, where you cannot install a new module/kernel > > + > > +.. _linux-tools: > > + > > +**How to do a simple analysis with linux tools?** > > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > + > > +For the start of a performance analysis, you can start with the usual tools like: > > + > > +- `top` / `htop` / `atop` (*get an overview of the system load, see spikes on specific processes*) > > +- `mpstat -P ALL` (*look at the load distribution among CPUs*) > > +- `iostat -x` (*observe input and output devices utilization and performance*) > > +- `vmstat` (*overview of memory usage on the system*) > > +- `pidstat` (*similar to* `vmstat` *but per process, to dial it down to the target*) > > +- `strace -tp $PID` (*once you know the process, you can figure out how it communicates with the Kernel*) > > + > > +These should help to figure out restrict the areas to look at sufficiently. > > + > > +**Diving deeper with perf** > > +~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > + > > +The **perf** tool provides a series of metrics and events to further dial down on issues. > > + > > +Prerequisite: TBD > > + > > +Gather statistics data for a decoding job: (This example is on a RK3399 SoC with the rkvdec codec driver using the `fluster test suite <https://github.com/fluendo/fluster>`__) > > +:: > > + > > + perf stat -d python3 fluster.py run -d GStreamer-H.264-V4L2SL-Gst1.0 -ts JVT-AVC_V1 -tv AUD_MW_E -j1 > > + ... > > + Performance counter stats for 'python3 fluster.py run -d GStreamer-H.264-V4L2SL-Gst1.0 -ts JVT-AVC_V1 -tv AUD_MW_E -j1 -v': > > + > > + 7794.23 msec task-clock:u # 0.697 CPUs utilized > > + 0 context-switches:u # 0.000 /sec > > + 0 cpu-migrations:u # 0.000 /sec > > + 11901 page-faults:u # 1.527 K/sec > > + 882671556 cycles:u # 0.113 GHz (95.79%) > > + 711708695 instructions:u # 0.81 insn per cycle (95.79%) > > + 10581935 branches:u # 1.358 M/sec (15.13%) > > + 6871144 branch-misses:u # 64.93% of all branches (95.79%) > > + 281716547 L1-dcache-loads:u # 36.144 M/sec (95.79%) > > + 9019581 L1-dcache-load-misses:u # 3.20% of all L1-dcache accesses (95.79%) > > + <not supported> LLC-loads:u > > + <not supported> LLC-load-misses:u > > + > > + 11.180830431 seconds time elapsed > > + > > + 1.502318000 seconds user > > + 6.377221000 seconds sys > > + > > +The availability of events and metrics depends on the system you are running. > > + > > +`Full documentation <https://perf.wiki.kernel.org/index.php/Main_Page>`__ > > + > > +**Perfetto** > > +~~~~~~~~~~~~ > > + > > +TBD > > + > > +.. _error_checking: > > + > > +KASAN, UBSAN, lockdep and other error checkers > > +---------------------------------------------- > > + > > +TBD > > + > > +.. _kernel_panic_analysis_tools: > > + > > +Kernel panic analysis tools > > +--------------------------- > > I think this section is worthwhile to make more prominent as well. This, as well as quite a few other sections in this document, are not specific to the media subsystem. Should they be moved to a more generic document, and maybe referenced from here ? > > + > > + To analyse the crash dump please use `Kdump` & `Kexec`. > > + > > + TBD > > + > > + `Full documentation <../../admin-guide/kdump/kdump.rst>`__ > > + > > + In order to find the corresponding line in the code you can use `faddr2line > > + <https://elixir.bootlin.com/linux/latest/source/scripts/faddr2line>`__, note > > + that you need to enable `CONFIG_DEBUG_INFO` for that to work. > > + > > + An alternative to using `faddr2line` is the use of `objdump` (and it's > > + derivatives for the different platforms like `aarch64-linux-gnu-objdump`), > > + take this line as an example: > > + > > + `[ +0.000240] rkvdec_device_run+0x50/0x138 [rockchip_vdec]`. > > + > > + We can find the corresponding line of code by executing: > > + :: > > + > > + aarch64-linux-gnu-objdump -dS drivers/staging/media/rkvdec/rockchip-vdec.ko | grep rkvdec_device_run\>: -A 40 > > + 0000000000000ac8 <rkvdec_device_run>: > > + ac8: d503201f nop > > + acc: d503201f nop > > + { > > + ad0: d503233f paciasp > > + ad4: a9bd7bfd stp x29, x30, [sp, #-48]! > > + ad8: 910003fd mov x29, sp > > + adc: a90153f3 stp x19, x20, [sp, #16] > > + ae0: a9025bf5 stp x21, x22, [sp, #32] > > + const struct rkvdec_coded_fmt_desc *desc = ctx->coded_fmt_desc; > > + ae4: f9411814 ldr x20, [x0, #560] > > + struct rkvdec_dev *rkvdec = ctx->dev; > > + ae8: f9418015 ldr x21, [x0, #768] > > + if (WARN_ON(!desc)) > > + aec: b4000654 cbz x20, bb4 <rkvdec_device_run+0xec> > > + ret = pm_runtime_resume_and_get(rkvdec->dev); > > + af0: f943d2b6 ldr x22, [x21, #1952] > > + ret = __pm_runtime_resume(dev, RPM_GET_PUT); > > + af4: aa0003f3 mov x19, x0 > > + af8: 52800081 mov w1, #0x4 // #4 > > + afc: aa1603e0 mov x0, x22 > > + b00: 94000000 bl 0 <__pm_runtime_resume> > > + if (ret < 0) { > > + b04: 37f80340 tbnz w0, #31, b6c <rkvdec_device_run+0xa4> > > + dev_warn(rkvdec->dev, "Not good\n"); > > + b08: f943d2a0 ldr x0, [x21, #1952] > > + b0c: 90000001 adrp x1, 0 <rkvdec_try_ctrl-0x8> > > + b10: 91000021 add x1, x1, #0x0 > > + b14: 94000000 bl 0 <_dev_warn> > > + *bad = 1; > > + b18: d2800001 mov x1, #0x0 // #0 > > + ... > > + > > + > > + To find the matching line we just have to add `0x50` (from > > + `rkvdec_device_run+0x50`) to `0xac8` (from `0000000000000ac8 > > + <rkvdec_device_run>:`), which yields `0xb18` corresponding with `*bad = 1`. > > + > > +**Copyright** |copy| 2024 : Collabora > > diff --git a/Documentation/media/guides/index.rst b/Documentation/media/guides/index.rst > > new file mode 100644 > > index 000000000000..0008966c0862 > > --- /dev/null > > +++ b/Documentation/media/guides/index.rst > > @@ -0,0 +1,11 @@ > > +.. SPDX-License-Identifier: GPL-2.0 > > + > > +============ > > +Media Guides > > +============ > > + > > +.. toctree:: > > + :caption: Table of Contents > > + :maxdepth: 1 > > + > > + debugging_issues > > diff --git a/Documentation/media/index.rst b/Documentation/media/index.rst > > index d056a9e99dca..5461876fc401 100644 > > --- a/Documentation/media/index.rst > > +++ b/Documentation/media/index.rst > > @@ -7,6 +7,7 @@ Media Subsystem Documentation > > .. toctree:: > > :maxdepth: 2 > > > > + guides/index > > ../userspace-api/media/index > > ../driver-api/media/index.rst > > ../admin-guide/media/index > >
diff --git a/Documentation/media/guides/debugging_issues.rst b/Documentation/media/guides/debugging_issues.rst new file mode 100644 index 000000000000..8fee749e7e6a --- /dev/null +++ b/Documentation/media/guides/debugging_issues.rst @@ -0,0 +1,477 @@ +.. SPDX-License-Identifier: GPL-2.0 +.. include:: <isonum.txt> + +=================================================== +Debugging and tracing in the media subsystem +=================================================== + +This document serves as a starting point and lookup for debugging device +drivers in the media subsystem. + +.. contents:: + :depth: 3 + +General debugging advice +======================== + +Depending on the issue, a different set of tools is available to track down the +problem or even to realize whether there is one in the first place. + +As a first step you have to figure out what kind of issue you want to debug. +Depending on the answer, your methodology and choice of tools may vary. +Some typical issues are listed below. The list is not exhaustive. + +What is the issue? +------------------ +1. Driver doesn't behave as expected + + TBD + +2. Running out of memory + + In this case you should use `Linux CLI tools <linux-tools_>`_ first to verify + whether the issue stems from the Kernel or from userspace. + + TBD + +3. Kernel panic + + The kernel provides a number of `tools <kernel_panic_analysis_tools_>`_ to + analyse a Kernel panic, use these first, once you have identified where the + error occurs you can further analyse the issue by adding `debug logs <Printk + & friends_>`_ to the codebase. + +4. Device doesn't probe + + TBD + +5. Driver too slow + + You should start with a `performance analysis <performance>`__, to pin down the problem area. + +Is timing a factor? +------------------- + +It is important to understand if the problem you want to debug manifests itself +consistently (i.e. given a set of inputs you always get the same, incorrect +output), or inconsistently. If it manifests itself inconsistently, some timing +factor might be at play. If inserting delays into the code does change the +behavior, then quite likely timing is a factor. + + In this case using a simple `printk`_ won't work, a similar alternative is + too use `trace_printk`_, which logs the debug messages to the trace file + instead of the kernel log. + +Do I have root access to the system? +------------------------------------ + +Are you easily able to replace the module in question or to install a new kernel? + +TBD + +.. _live_debug: + +Do I need to debug on a live system? +------------------------------------ + +When the kernel is compiled with `Ftrace`_, this tool might be a good start, as you can trace specific functions to dial down to the sequence of actions that causes the misbehavior. + +TBD + +Is the outcome flaky? +--------------------- + +We will call the outcome "flaky" if multiple iterations of the same routine with the same parameters result in different outcomes. + +TBD + +Available tools +=============== + +Printk & friends +---------------- + +These are derivatives of printf() with varying destination and support for being dynamically turned on or off, or lack thereof. + +.. _printk: + +**Simple printk** +~~~~~~~~~~~~~~~~~ + +The classic, this can be used to great effect for quick and dirty development +of new modules or to extract arbitrary necessary data for troubleshooting. + +Prerequisite: `CONFIG_PRINTK` (usually enabled by default) + +**Pros**: + +- No need to learn anything, simple to use +- Easy to modify exactly to your needs (formatting of the data (See: `format specifiers <../../core-api/printk-formats.html#printk-specifiers>`__), visibility in the log) +- Can cause delays in the execution of the code (beneficial to confirm whether timing is a factor) + +**Cons**: + +- Requires rebuilding the kernel/module, (See: `live_debug`_) +- Can cause delays in the execution of the code + +`Full documentation <../../core-api/printk-basics.rst>`__ + +.. _trace_printk: + +**Trace_printk** +~~~~~~~~~~~~~~~~ + +Prerequisite: `CONFIG_DYNAMIC_FTRACE` & `#include <linux/ftrace.h>` + +A tiny bit less comfortable to use than `printk`_, because you will have to +read the messages from the trace file (See: `Reading the ftrace log`_ instead +of from the kernel log, but very useful when printk adds unwanted delays into +the code execution, causing issues to be flaky or hidden.) + +If the processing of this still causes timing issues then you can try `trace_puts()`. + +`Full Documentation <../../driver-api/basics.html#c.trace_printk>`__ + +**dev_dbg / v4l2_dbg** +~~~~~~~~~~~~~~~~~~~~~~ + +- Difference between both? + + - v4l2_dbg utilizes v4l2_printk under the hood, which further uses printk directly, thus it cannot be targeted by dynamic debug + - dev_dbg can be targeted by dynamic debug + - v4l2_dbg has a more specific prefix format for the media subsystem, while dev_dbg only highlights the driver name and the location of the log + +**The dev_debug module parameter** + +Every video device provides a `dev_debug` parameter, which allows to get further insights into the IOCTLs in the background. + +`Full documentation <../../driver-api/media/v4l2-dev.html#video-device-debugging>`__ + +**When is it appropriate to leave a debug print in the code?** + +Permanent debug statements have to be useful for a developer to troubleshoot +why the driver misbehaves. Judging that is a bit more of an art than a +science, but some guidelines are in the `Coding style guide +<../../process/coding-style.html#printing-kernel-messages>`__) + +**Custom printk** +~~~~~~~~~~~~~~~~~ +Example: +:: + + #define core_dbg(fmt, arg...) do { \ + if (core_debug) \ + printk(KERN_DEBUG pr_fmt("core: " fmt), ## arg); \ + } while (0) + +**When should you do this?** + +It is better to just use a `pr_debug()`, which can later be turned on/off with +dynamic debug. Additionally, a lot of drivers activate these prints via a +variable like `core_debug` set by a module parameter. However, Module +parameters `are not recommended anymore +<https://lkml.org/lkml/2024/3/27/163>`_. + +Dynamic debug +------------- + +Mechanism to filter what ends up in the kernel log by dis-/en-abling log +messages. + +Prerequisite: `CONFIG_DYNAMIC_DEBUG` + +.. _valid_dyndbg_prints: + +Dynamic debug is only able to target: + +- `pr_debug()` +- `dev_dbg()` +- `print_hex_dump_debug()` +- `print_hex_dump_bytes()` + +Therefore the usability of this tool is quite limited in the media subsystem, +because, as of now, there is no uniform rule for adding debug prints to the codebase, +resulting in a variety of ways these prints are implemented. + +Also, note that most debug statements are implemented as a variation of +`dprintk`, which have to be activated via a parameter in respective module, +dynamic debug is unable to do that step for you. + +Here is one example, that enables all available `pr_debug()`'s within the file: +:: + + $ alias ddcmd='echo $* > /proc/dynamic_debug/control' + $ ddcmd '-p; file v4l2-h264.c +p' + $ grep =p /proc/dynamic_debug/control + drivers/media/v4l2-core/v4l2-h264.c:372 [v4l2_h264]print_ref_list_b =p "ref_pic_list_b%u (cur_poc %u%c) %s" + drivers/media/v4l2-core/v4l2-h264.c:333 [v4l2_h264]print_ref_list_p =p "ref_pic_list_p (cur_poc %u%c) %s\n" + +**When should you use this over** `Ftrace`_ **?** + +- When the code contains one of the :ref:`valid print statements <valid_dyndbg_prints_>`_ or when you have added multiple pr_debug() statements during development +- When timing is not an issue, meaning if multiple `pr_debug()` statements in the code won't cause delays +- When you care more about receiving specific log messages than tracing the pattern of how a function is called + +`Full documentation <../../admin-guide/dynamic-debug-howto.rst>`__ + +Ftrace +------ + +Prerequisite: `CONFIG_DYNAMIC_FTRACE` + +Trace whenever the `rkvdec_try_ctrl` function is called +:: + + $ cd /sys/kernel/tracing + $ echo function > /sys/kernel/tracing/current_tracer + $ echo rkvdec_try_ctrl > set_ftrace_filter + $ echo 1 > tracing_on + $ cat trace + h264parse0:sink-6359 [001] ...1. 172714.547523: rkvdec_try_ctrl <-try_or_set_cluster + h264parse0:sink-6359 [005] ...1. 172714.567386: rkvdec_try_ctrl <-try_or_set_cluster + +Find out from where the calls originate +:: + + $ echo 1 > options/func_stack_trace + h264parse0:sink-6715 [002] ..... 172837.967762: rkvdec_try_ctrl <-try_or_set_cluster + h264parse0:sink-6715 [002] ..... 172837.967773: <stack trace> + => rkvdec_try_ctrl + => try_or_set_cluster + => try_set_ext_ctrls_common + => try_set_ext_ctrls + => v4l2_s_ext_ctrls + => v4l_s_ext_ctrls + ... + h264parse0:sink-6715 [004] ..... 172837.985747: rkvdec_try_ctrl <-try_or_set_cluster + h264parse0:sink-6715 [004] ..... 172837.985750: <stack trace> + => rkvdec_try_ctrl + => try_or_set_cluster + => v4l2_ctrl_request_setup + => rkvdec_run_preamble + => rkvdec_h264_run + => rkvdec_device_run + ... + +Trace the children of a function call and show the return values (requires config `FUNCTION_GRAPH_RETVAL`) +:: + + echo function_graph > current_tracer + echo rkvdec_h264_run > set_graph_function + echo 4 > max_graph_depth + echo do_interrupt_handler mutex_* > set_graph_notrace + echo 1 > options/funcgraph-retval + ... + 4) | rkvdec_h264_run [rockchip_vdec]() { + 4) | v4l2_ctrl_find [videodev]() { + ... + 4) | rkvdec_run_preamble [rockchip_vdec]() { + 4) 4.666 us | v4l2_m2m_next_buf [v4l2_mem2mem](); /* = 0xffff000005782000 */ + ... + 4) | v4l2_ctrl_request_setup [videodev]() { + 4) 4.667 us | media_request_object_find [mc](); /* = 0xffff000005e3aa98 */ + 4) 1.750 us | find_ref [videodev](); /* = 0xffff00000833b2a0 */ + ... + 4) 1.750 us | v4l2_m2m_buf_copy_metadata [v4l2_mem2mem](); /* = 0x0 */ + 4) ! 114.333 us | } /* rkvdec_run_preamble [rockchip_vdec] = 0x0 */ + 4) 2.334 us | v4l2_h264_init_reflist_builder [v4l2_h264](); /* = 0x3e */ + ... + 4) | v4l2_h264_build_b_ref_lists [v4l2_h264]() { + ... + 4) | rkvdec_run_postamble [rockchip_vdec]() { + ... + 4) ! 444.208 us | } /* rkvdec_h264_run [rockchip_vdec] = 0x0 */ + +**Reading the ftrace log** +~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The `trace` file can be read just like any other file (`cat`, `tail`, `head`, +`vim`, etc.), the size of the file is limited by the `buffer_size_kb` (``echo +1000 > buffer_size_kb``). The `trace_pipe` will behave similar to the `trace` +file, but whenever you read from the file the content is consumed. + +**Kernelshark** +~~~~~~~~~~~~~~~ + +`Kernelshark documentation <https://kernelshark.org/Documentation.html>`_ + +`Full documentation <../../trace/ftrace.rst>`__ + +DebugFS +------- + +Prerequisite: `CONFIG_DEBUG_FS` & `#include <linux/debugfs.h>` + +DebugFS differs from the other approaches of debugging, as it doesn't write messages to the kernel log nor add traces to the code. Instead it allows the developer to handle a set of files. +With these files you can either store values of variables or make register/memory dumps or you can make these files writable and modify values/settings in the driver. +Possible use-cases among others: + +- Store register values +- Keep track of variables +- Store errors +- Store settings +- Toggle a setting like debug on/off +- Error injection + +This is especially useful, when the size of a data dump would be hard to digest as +part of the general kernel log (for example when dumping raw bitstream data) or +when you are not interested in all the values all the time, but with the +possibility to inspect them. + +The general idea is: + +- Create a directory during probe (`struct dentry *parent = debugfs_create_dir("my_driver", NULL);`) +- Create a file (`debugfs_create_u32("my_value", 444, parent, &my_variable);`) + + - In this example the file is found in `/sys/kernel/debug/my_driver/my_value` (with read permissions for user/group/all) + - any update of `my_variable` will update the value in the file + +- Clean up the folder when removing the device (`debugfs_remove_recursive(parent);`) + +`Full documentation <../../filesystems/debugfs.rst>`__ + +Perf & alternatives +------------------- + +The tools mentioned above provide ways to inspect kernel code, results, variable values, etc. +Sometimes you have to find out first where to look and for those cases a box of performance tracking tools can help you to frame the issue. + +.. _performance: + +**Why should you do a performance analysis?** +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +A performance analysis is a good first step when among other reasons: + +- you cannot define the issue +- you do not know where it occurs +- the running system should not be interrupted or it is a remote system, where you cannot install a new module/kernel + +.. _linux-tools: + +**How to do a simple analysis with linux tools?** +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +For the start of a performance analysis, you can start with the usual tools like: + +- `top` / `htop` / `atop` (*get an overview of the system load, see spikes on specific processes*) +- `mpstat -P ALL` (*look at the load distribution among CPUs*) +- `iostat -x` (*observe input and output devices utilization and performance*) +- `vmstat` (*overview of memory usage on the system*) +- `pidstat` (*similar to* `vmstat` *but per process, to dial it down to the target*) +- `strace -tp $PID` (*once you know the process, you can figure out how it communicates with the Kernel*) + +These should help to figure out restrict the areas to look at sufficiently. + +**Diving deeper with perf** +~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The **perf** tool provides a series of metrics and events to further dial down on issues. + +Prerequisite: TBD + +Gather statistics data for a decoding job: (This example is on a RK3399 SoC with the rkvdec codec driver using the `fluster test suite <https://github.com/fluendo/fluster>`__) +:: + + perf stat -d python3 fluster.py run -d GStreamer-H.264-V4L2SL-Gst1.0 -ts JVT-AVC_V1 -tv AUD_MW_E -j1 + ... + Performance counter stats for 'python3 fluster.py run -d GStreamer-H.264-V4L2SL-Gst1.0 -ts JVT-AVC_V1 -tv AUD_MW_E -j1 -v': + + 7794.23 msec task-clock:u # 0.697 CPUs utilized + 0 context-switches:u # 0.000 /sec + 0 cpu-migrations:u # 0.000 /sec + 11901 page-faults:u # 1.527 K/sec + 882671556 cycles:u # 0.113 GHz (95.79%) + 711708695 instructions:u # 0.81 insn per cycle (95.79%) + 10581935 branches:u # 1.358 M/sec (15.13%) + 6871144 branch-misses:u # 64.93% of all branches (95.79%) + 281716547 L1-dcache-loads:u # 36.144 M/sec (95.79%) + 9019581 L1-dcache-load-misses:u # 3.20% of all L1-dcache accesses (95.79%) + <not supported> LLC-loads:u + <not supported> LLC-load-misses:u + + 11.180830431 seconds time elapsed + + 1.502318000 seconds user + 6.377221000 seconds sys + +The availability of events and metrics depends on the system you are running. + +`Full documentation <https://perf.wiki.kernel.org/index.php/Main_Page>`__ + +**Perfetto** +~~~~~~~~~~~~ + +TBD + +.. _error_checking: + +KASAN, UBSAN, lockdep and other error checkers +---------------------------------------------- + +TBD + +.. _kernel_panic_analysis_tools: + +Kernel panic analysis tools +--------------------------- + + To analyse the crash dump please use `Kdump` & `Kexec`. + + TBD + + `Full documentation <../../admin-guide/kdump/kdump.rst>`__ + + In order to find the corresponding line in the code you can use `faddr2line + <https://elixir.bootlin.com/linux/latest/source/scripts/faddr2line>`__, note + that you need to enable `CONFIG_DEBUG_INFO` for that to work. + + An alternative to using `faddr2line` is the use of `objdump` (and it's + derivatives for the different platforms like `aarch64-linux-gnu-objdump`), + take this line as an example: + + `[ +0.000240] rkvdec_device_run+0x50/0x138 [rockchip_vdec]`. + + We can find the corresponding line of code by executing: + :: + + aarch64-linux-gnu-objdump -dS drivers/staging/media/rkvdec/rockchip-vdec.ko | grep rkvdec_device_run\>: -A 40 + 0000000000000ac8 <rkvdec_device_run>: + ac8: d503201f nop + acc: d503201f nop + { + ad0: d503233f paciasp + ad4: a9bd7bfd stp x29, x30, [sp, #-48]! + ad8: 910003fd mov x29, sp + adc: a90153f3 stp x19, x20, [sp, #16] + ae0: a9025bf5 stp x21, x22, [sp, #32] + const struct rkvdec_coded_fmt_desc *desc = ctx->coded_fmt_desc; + ae4: f9411814 ldr x20, [x0, #560] + struct rkvdec_dev *rkvdec = ctx->dev; + ae8: f9418015 ldr x21, [x0, #768] + if (WARN_ON(!desc)) + aec: b4000654 cbz x20, bb4 <rkvdec_device_run+0xec> + ret = pm_runtime_resume_and_get(rkvdec->dev); + af0: f943d2b6 ldr x22, [x21, #1952] + ret = __pm_runtime_resume(dev, RPM_GET_PUT); + af4: aa0003f3 mov x19, x0 + af8: 52800081 mov w1, #0x4 // #4 + afc: aa1603e0 mov x0, x22 + b00: 94000000 bl 0 <__pm_runtime_resume> + if (ret < 0) { + b04: 37f80340 tbnz w0, #31, b6c <rkvdec_device_run+0xa4> + dev_warn(rkvdec->dev, "Not good\n"); + b08: f943d2a0 ldr x0, [x21, #1952] + b0c: 90000001 adrp x1, 0 <rkvdec_try_ctrl-0x8> + b10: 91000021 add x1, x1, #0x0 + b14: 94000000 bl 0 <_dev_warn> + *bad = 1; + b18: d2800001 mov x1, #0x0 // #0 + ... + + + To find the matching line we just have to add `0x50` (from + `rkvdec_device_run+0x50`) to `0xac8` (from `0000000000000ac8 + <rkvdec_device_run>:`), which yields `0xb18` corresponding with `*bad = 1`. + +**Copyright** |copy| 2024 : Collabora diff --git a/Documentation/media/guides/index.rst b/Documentation/media/guides/index.rst new file mode 100644 index 000000000000..0008966c0862 --- /dev/null +++ b/Documentation/media/guides/index.rst @@ -0,0 +1,11 @@ +.. SPDX-License-Identifier: GPL-2.0 + +============ +Media Guides +============ + +.. toctree:: + :caption: Table of Contents + :maxdepth: 1 + + debugging_issues diff --git a/Documentation/media/index.rst b/Documentation/media/index.rst index d056a9e99dca..5461876fc401 100644 --- a/Documentation/media/index.rst +++ b/Documentation/media/index.rst @@ -7,6 +7,7 @@ Media Subsystem Documentation .. toctree:: :maxdepth: 2 + guides/index ../userspace-api/media/index ../driver-api/media/index.rst ../admin-guide/media/index
Create a guides section for all documentation material, that isn't strictly related to a specific piece of code. Provide a guide for developers on how to debug code with a focus on the media subsystem. This document aims to provide a rough overview over the possibilities and a rational to help choosing the right tool for the given circumstances. Signed-off-by: Sebastian Fricke <sebastian.fricke@collabora.com> --- Documentation/media/guides/debugging_issues.rst | 477 ++++++++++++++++++++++++ Documentation/media/guides/index.rst | 11 + Documentation/media/index.rst | 1 + 3 files changed, 489 insertions(+)