mbox series

[bpf,v1,0/3] Fixes for BPF timer lockup and UAF

Message ID 20240709185440.1104957-1-memxor@gmail.com (mailing list archive)
Headers show
Series Fixes for BPF timer lockup and UAF | expand

Message

Kumar Kartikeya Dwivedi July 9, 2024, 6:54 p.m. UTC
The following patches contain fixes for timer lockups and a
use-after-free scenario.

This set proposes to fix the following lockup situation for BPF timers.

CPU 1					CPU 2

bpf_timer_cb				bpf_timer_cb
  timer_cb1				  timer_cb2
    bpf_timer_cancel(timer_cb2)		    bpf_timer_cancel(timer_cb1)
      hrtimer_cancel			      hrtimer_cancel

In this case, both callbacks will continue waiting for each other to
finish synchronously, causing a lockup.

The proposed fix adds support for tracking in-flight cancellations
*begun by other timer callbacks* for a particular BPF timer.  Whenever
preparing to call hrtimer_cancel, a callback will increment the target
timer's counter, then inspect its in-flight cancellations, and if
non-zero, return -EDEADLK to avoid situations where the target timer's
callback is waiting for its completion.

This does mean that in cases where a callback is fired and cancelled, it
will be unable to cancel any timers in that execution. This can be
alleviated by maintaining the list of waiting callbacks in bpf_hrtimer
and searching through it to avoid interdependencies, but this may
introduce additional delays in bpf_timer_cancel, in addition to
requiring extra state at runtime which may need to be allocated or
reused from bpf_hrtimer storage. Moreover, extra synchronization is
needed to delete these elements from the list of waiting callbacks once
hrtimer_cancel has finished.

The second patch is for a deadlock situation similar to above in
bpf_timer_cancel_and_free, but also a UAF scenario that can occur if
timer is armed before entering it, if hrtimer_running check causes the
hrtimer_cancel call to be skipped.

As seen above, synchronous hrtimer_cancel would lead to deadlock (if
same callback tries to free its timer, or two timers free each other),
therefore we queue work onto the global workqueue to ensure outstanding
timers are cancelled before bpf_hrtimer state is freed.

Further details are in the patches.

Kumar Kartikeya Dwivedi (3):
  bpf: Fail bpf_timer_cancel when callback is being cancelled
  bpf: Defer work in bpf_timer_cancel_and_free
  selftests/bpf: Add timer lockup selftest

 kernel/bpf/helpers.c                          | 99 +++++++++++++++----
 .../selftests/bpf/prog_tests/timer_lockup.c   | 65 ++++++++++++
 .../selftests/bpf/progs/timer_lockup.c        | 85 ++++++++++++++++
 3 files changed, 232 insertions(+), 17 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/timer_lockup.c
 create mode 100644 tools/testing/selftests/bpf/progs/timer_lockup.c


base-commit: 528269fe117f3b19461733a0fa408c55a5270aff

Comments

patchwork-bot+netdevbpf@kernel.org July 10, 2024, 11:30 p.m. UTC | #1
Hello:

This series was applied to bpf/bpf.git (master)
by Alexei Starovoitov <ast@kernel.org>:

On Tue,  9 Jul 2024 18:54:37 +0000 you wrote:
> The following patches contain fixes for timer lockups and a
> use-after-free scenario.
> 
> This set proposes to fix the following lockup situation for BPF timers.
> 
> CPU 1					CPU 2
> 
> [...]

Here is the summary with links:
  - [bpf,v1,1/3] bpf: Fail bpf_timer_cancel when callback is being cancelled
    https://git.kernel.org/bpf/bpf/c/d4523831f07a
  - [bpf,v1,2/3] bpf: Defer work in bpf_timer_cancel_and_free
    https://git.kernel.org/bpf/bpf/c/a6fcd19d7eac
  - [bpf,v1,3/3] selftests/bpf: Add timer lockup selftest
    (no matching commit)

You are awesome, thank you!