mbox series

[v4,0/2] tracing/ring-buffer: Fix wakeup of ring buffer waiters

Message ID 20240312121506.972039112@goodmis.org (mailing list archive)
Headers show
Series tracing/ring-buffer: Fix wakeup of ring buffer waiters | expand

Message

Steven Rostedt March 12, 2024, 12:15 p.m. UTC
[ Sorry for the spam, but I just noticed kernel-test-robot complained
  about KernelDoc format, which this fixes ]


A patch was sent to "fix" the wait_index variable that is used to help with
waking of waiters on the ring buffer. The patch was rejected, but I started
looking at associated code. Discussing it on IRC with Mathieu Desnoyers
we discovered a design flaw.

The waiter reads "wait_index" then enters a "wait loop". After adding
itself to the wait queue, if the buffer is filled or a signal is pending
it will exit out. If wait_index is different, it will also exit out.
(Note, I noticed that the check for wait_index was after the schedule()
call and should have been before it, but that's besides the point).

The race is what happens if the waker updates the wait_index,
a new waiter comes in and reads the updated wait_index before it adds
itself to the queue, it will miss the update!

These patches fix the design by converting the ring_buffer_wait()
to use the wait_event_interruptible() interface.

Then the wait_to_pipe() caller will pass its own condition into
the ring_buffer_wait() to be checked by the wait_event_interruptible().

- The first patch restructures the ring_buffer_wait() to use the
  wait_event_interruptible() logic. It does not change the interface,
  but adds a local "cond" condition callback function that will be
  what it defaults to in the second patch if NULL is passed in.

  The default cond function is just a "wait once" function. That
  is, the first time it is called (before the wait_event_interruptible()
  calls schedule) will set the "once" variable to one and return
  false. The next time it is called (after wait_event_interruptible()
  wakes up) it will return true to break out of the loop.

- The second patch changes the ring_buffer_wait() interface to allow
  the caller to define its own "cond" callback. That will be checked
  in wait_event_interruptible() along with checking if the proper
  amount of data in the ring buffer has been hit.

Changes since v3: https://lore.kernel.org/all/20240312120252.998983805@goodmis.org/

- I should have checked before sending v2, but after I did, I noticed
  that kernel-test-robot reported that the cond and data parameters
  added to ring_buffer_wait() were not added to the kerneldoc above it.

Changes since v2: https://lore.kernel.org/all/20240308202402.234176464@goodmis.org/

- Patches 1-3 of v2 have been accepted.

- Instead of breaking ring_buffer_wait() into three functions that do
  a prepare_to_wait(), wait() and finish_wait(), have ring_buffer_wait()
  take a condition callback function and a data parameter.

  The ring_buffer_wait() now uses a wait_event_interruptible() code,
  and added a helper function to do check its own conditions, and also
  to call the passed in condition function and allow the caller to
  specify other conditions to break out of the event loop.

Changes since v1: https://lore.kernel.org/lkml/20240308183816.676883229@goodmis.org/

- My tests triggered a warning about calling a mutex_lock() after a
  prepare_to_wait() that changed the task's state. Convert the affected
  mutex over to a spinlock.


Steven Rostedt (Google) (2):
      ring-buffer: Use wait_event_interruptible() in ring_buffer_wait()
      tracing/ring-buffer: Fix wait_on_pipe() race

----
 include/linux/ring_buffer.h  |   4 +-
 include/linux/trace_events.h |   5 +-
 kernel/trace/ring_buffer.c   | 119 ++++++++++++++++++++++++++-----------------
 kernel/trace/trace.c         |  43 +++++++++++-----
 4 files changed, 109 insertions(+), 62 deletions(-)