mbox series

[v3,0/5] ftrace: Fix fprobe with function graph accounting

Message ID 20250220202009.689253424@goodmis.org (mailing list archive)
Headers show
Series ftrace: Fix fprobe with function graph accounting | expand

Message

Steven Rostedt Feb. 20, 2025, 8:20 p.m. UTC
Heiko Carstens reported[1] a bug when running the ftrace selftests.
After running them, he noticed that the enabled_functions file had
all functions enabled in it. That means something was registered to
ftrace that shouldn't be.

The issue was with the accounting of the new fprobe logic which was
built on top of the function graph infrastructure. Patch 3 of this
series is the fix for that bug, but while debugging that, 3 other
accounting bugs were discovered.

The bug fix for the reported bug was that fprobes would update its counter
every time a new fprobe was added, even if a new fprobe was attached to a
function that was already attached to another fprobe. But when removing the
fprobe, it would not decrement the counter of an fprobe with a duplicate.
This left the fprobe ops attached to function graph and when it removed the
last probe from the hash, it would create an empty filter hash which would
tell function graph that it wanted to trace all functions. The solution was
to always decrement the counter when a fprobe was removed.

The first of the other bugs was that when enabling a second subops to the
function graph infrastructure, it would add all functions to be called back
and not just the functions for the two subops for tracing. This was due to
the creation of the filter hash for the manager ops that controls the
subops. The first iteration where the manage ops hash was NULL was mistaken
as an "empty hash" which means to trace all functions.

The second bug was when adding a function to the hash where the hash already
had that function, it would allocate and create a new entry for it.  This
leaves duplicates and causes unnecessary overhead and memory wastage.

The third bug was when the last fprobe was removed, it would just unregister
from function graph but it did not remove the function from its own ops
hash. When adding a new fprobe, it would not only enable the function for
that new fprobe, but it would also enable the function of the last fprobe
that was removed.

Finally, a test was added to check and fail if any of the bugs were
introduced, with the exception of the duplicate hash entries, as that
was more of a memory waste and not something visible by user space.

[1] https://lore.kernel.org/all/20250217114918.10397-A-hca@linux.ibm.com/

Changes since v2: https://lore.kernel.org/linux-trace-kernel/20250219220436.498041541@goodmis.org/

- Have append_hashes() return EMPTY_HASH instead of NULL when the resulting
  hash is empty.

Changes since v1: https://lore.kernel.org/all/20250218193033.338823920@goodmis.org/

- Moved the "Always unregister fgraph function from ops" to patch 3

- Change the "Fix accounting" patch to do the update in
  fprobe_graph_remove_ips() to make it match fprobe_graph_add_ips().


Steven Rostedt (5):
      ftrace: Fix accounting of adding subops to a manager ops
      ftrace: Do not add duplicate entries in subops manager ops
      fprobe: Always unregister fgraph function from ops
      fprobe: Fix accounting of when to unregister from function graph
      selftests/ftrace: Update fprobe test to check enabled_functions file

----
 kernel/trace/fprobe.c                              | 12 ++---
 kernel/trace/ftrace.c                              | 36 ++++++++++-----
 .../ftrace/test.d/dynevent/add_remove_fprobe.tc    | 54 ++++++++++++++++++++++
 3 files changed, 84 insertions(+), 18 deletions(-)