diff mbox series

[v3] timer: Fix a race condition between timer's callback and destroying code

Message ID 20240627003134.3447175-1-rkir@google.com (mailing list archive)
State New, archived
Headers show
Series [v3] timer: Fix a race condition between timer's callback and destroying code | expand

Commit Message

Roman Kiryanov June 27, 2024, 12:31 a.m. UTC
`timerlist_run_timers` provides no mechanism to
make sure the data pointed by `opaque` is valid
when calling timer's callback: there could be
another thread running which is destroying
timer's opaque data.

With this change `timer_del` becomes blocking if
timer's callback is running and it will be safe
to destroy timer's data once `timer_del` returns.

Signed-off-by: Roman Kiryanov <rkir@google.com>
---
v2: rebased to the right branch and removed
    Google specific tags from the commit message.

v3: if a timer's callback happens to be running
    (cb_running) wait until all timers are done.
    qatomic_read/qemu_event_reset could be racing
    and timer_del might wait one extra loop of
    timers to be done.

 include/qemu/timer.h |  1 +
 util/qemu-timer.c    | 11 +++++++++++
 2 files changed, 12 insertions(+)

Comments

Paolo Bonzini June 27, 2024, 1:27 p.m. UTC | #1
On Thu, Jun 27, 2024 at 2:32 AM Roman Kiryanov <rkir@google.com> wrote:
> +        if (qatomic_read(&ts->cb_running)) {
> +            qemu_event_wait(&timer_list->timers_done_ev);
> +        }

qemu_event_wait() already has the right atomic magic, and
ts->cb_running is both redundant (in general), and I think racy (as
implemented in this patch). This stuff is really hard to get right. At
the very least you need a store-release when clearing the flag, and I
think it also has to be read under the mutex (I'm not sure if there's
anything else, I haven't done a full analysis).

But especially, you haven't justified in the commit message _why_ you
need this. Since this problem is not timer-specific, using
aio_bh_schedule_oneshot() or aio_wait_bh_oneshot() to synchronize
everything with the AioContext thread seems like a superior solution
to me.

Paolo

>      }
>  }
>
> @@ -571,9 +576,15 @@ bool timerlist_run_timers(QEMUTimerList *timer_list)
>          cb = ts->cb;
>          opaque = ts->opaque;
>
> +        /* prevent timer_del from returning while cb(opaque)
> +         * is still running (by waiting for timers_done_ev).
> +         */
> +        qatomic_set(&ts->cb_running, true);
> +
>          /* run the callback (the timer list can be modified) */
>          qemu_mutex_unlock(&timer_list->active_timers_lock);
>          cb(opaque);
> +        qatomic_set(&ts->cb_running, false);
>          qemu_mutex_lock(&timer_list->active_timers_lock);
>
>          progress = true;
> --
> 2.45.2.741.gdbec12cfda-goog
>
Roman Kiryanov June 27, 2024, 4:12 p.m. UTC | #2
On Thu, Jun 27, 2024 at 6:27 AM Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On Thu, Jun 27, 2024 at 2:32 AM Roman Kiryanov <rkir@google.com> wrote:
> > +        if (qatomic_read(&ts->cb_running)) {
> > +            qemu_event_wait(&timer_list->timers_done_ev);
> > +        }
>
> qemu_event_wait() already has the right atomic magic, and
> ts->cb_running is both redundant (in general), and I think racy (as
> implemented in this patch).

I added cb_running to avoid waiting for timers_done_ev if we know our
cb is done.

> But especially, you haven't justified in the commit message _why_ you
> need this.

I mentioned the problem of cleanup racing with the timer's callback function
in the current shape of QEMU.

> Since this problem is not timer-specific,

It would be nice to fix all the places then.

> using
> aio_bh_schedule_oneshot() or aio_wait_bh_oneshot() to synchronize
> everything with the AioContext thread seems like a superior solution
> to me.

Could you please elaborate? The problem we want to solve is this:

void myThreadFunc() {
    CallbackState callbackState;
    QEMUTimer timer;

    timer_init(&timer, myClockType, myScale, &myTimerCallbackFunc,
&callbackState);
    ...
    timer_del(&timer);
}

Currently, myTimerCallbackFunc could fire after myThreadFunc exits
(if timer_del runs between qemu_mutex_unlock and cb(opaque) in
timerlist_run_timers) and callbackState gets destroyed.
Paolo Bonzini June 27, 2024, 5:15 p.m. UTC | #3
On Thu, Jun 27, 2024 at 6:12 PM Roman Kiryanov <rkir@google.com> wrote:
>
> On Thu, Jun 27, 2024 at 6:27 AM Paolo Bonzini <pbonzini@redhat.com> wrote:
> >
> > On Thu, Jun 27, 2024 at 2:32 AM Roman Kiryanov <rkir@google.com> wrote:
> > > +        if (qatomic_read(&ts->cb_running)) {
> > > +            qemu_event_wait(&timer_list->timers_done_ev);
> > > +        }
> >
> > qemu_event_wait() already has the right atomic magic, and
> > ts->cb_running is both redundant (in general), and I think racy (as
> > implemented in this patch).
>
> I added cb_running to avoid waiting for timers_done_ev if we know our
> cb is done.

Yes, but it's very tricky. Assuming we want to fix it in the timer
core, the QemuEvent should be enough, no need to optimize it. On the
other hand, I'm still worried about deadlocks (more below).

> > But especially, you haven't justified in the commit message _why_ you
> > need this.
>
> I mentioned the problem of cleanup racing with the timer's callback function
> in the current shape of QEMU.

Yes, but it was not clear what are the involved threads. It is clear
now that you have a function in a separate thread, creating a timer in
the main QEMU event loop.

> > using
> > aio_bh_schedule_oneshot() or aio_wait_bh_oneshot() to synchronize
> > everything with the AioContext thread seems like a superior solution
> > to me.
>
> Could you please elaborate? The problem we want to solve is this:
>
> void myThreadFunc() {
>     CallbackState callbackState;
>     QEMUTimer timer;
>
>     timer_init(&timer, myClockType, myScale, &myTimerCallbackFunc,
> &callbackState);
>     ...
>     timer_del(&timer);
> }
>
> Currently, myTimerCallbackFunc could fire after myThreadFunc exits
> (if timer_del runs between qemu_mutex_unlock and cb(opaque) in
> timerlist_run_timers) and callbackState gets destroyed.

Ok, got it now. I agree that qemu_event_wait() is safe for you here
because you are in a completely separate thread. But I'm worried that
it causes deadlocks in QEMU where the timer callback and the timer_del
run in the same thread.

I think the easiest options would be:

1) if possible, allocate the timer and the callbackState statically in
the device.

2) use "aio_wait_bh_oneshot(qemu_get_aio_context(), [](void
*opaque){}, NULL);" after timer_del(). You can also put the timer and
the callbackState in a RAII wrapper, so that aio_wait_bh_oneshot() is
executed when the RAII wrapper is destructed

Another thing that you could do is to use a shared_ptr<> for the
timer+callbackState combo, and pass a weak_ptr<> to the timer. Then:

- at the beginning of the timer, you upgrade the weak_ptr with lock()
and if it fails, return

- at the end of myThreadfunc, you destruct the shared_ptr before
deleting the timer.

I'm not sure how you'd pass the weak_ptr/shared_ptr to a callback
(Rust has Weak::into_raw/Weak::from_raw, but I don't know C++ well
enough). That may be overkill.

Paolo
Roman Kiryanov June 27, 2024, 5:53 p.m. UTC | #4
Paolo, thank you for your comments.

On Thu, Jun 27, 2024 at 10:16 AM Paolo Bonzini <pbonzini@redhat.com> wrote:
> I think the easiest options would be:
>
> 1) if possible, allocate the timer and the callbackState statically in
> the device.

I think this assumption is not good for both QEMU and us.

> 2) use "aio_wait_bh_oneshot(qemu_get_aio_context(), [](void
> *opaque){}, NULL);" after timer_del(). You can also put the timer and
> the callbackState in a RAII wrapper, so that aio_wait_bh_oneshot() is
> executed when the RAII wrapper is destructed

My understanding is that this will work as a fence waiting for all
timers to finish. If so, maybe there is a value to put it into QEMU
(as times_joins() or even as timer_join(QEMUTimer *ts)) if one day
you decide to implement it in a more efficient way?

> Another thing that you could do is to use a shared_ptr<> for the
> timer+callbackState combo, and pass a weak_ptr<> to the timer. Then:
>
> - at the beginning of the timer, you upgrade the weak_ptr with lock()
> and if it fails, return
>
> I'm not sure how you'd pass the weak_ptr/shared_ptr to a callback

I suspect this is not possible in plain C++ without modifying QEMU or
code generating at runtime.

I would go with your aio_wait_bh_oneshot suggestion. Please consider
adding it to QEMU as I pointed above. I can send a patch.
Patrick Leis Aug. 14, 2024, 9:12 p.m. UTC | #5
On Thu, Jun 27, 2024 at 10:55 AM Roman Kiryanov <rkir@google.com> wrote:

> Paolo, thank you for your comments.
>
> On Thu, Jun 27, 2024 at 10:16 AM Paolo Bonzini <pbonzini@redhat.com>
> wrote:
> > I think the easiest options would be:
> >
> > 1) if possible, allocate the timer and the callbackState statically in
> > the device.
>
> I think this assumption is not good for both QEMU and us.
>
> > 2) use "aio_wait_bh_oneshot(qemu_get_aio_context(), [](void
> > *opaque){}, NULL);" after timer_del(). You can also put the timer and
> > the callbackState in a RAII wrapper, so that aio_wait_bh_oneshot() is
> > executed when the RAII wrapper is destructed
>
> My understanding is that this will work as a fence waiting for all
> timers to finish. If so, maybe there is a value to put it into QEMU
> (as times_joins() or even as timer_join(QEMUTimer *ts)) if one day
> you decide to implement it in a more efficient way?
>
> > Another thing that you could do is to use a shared_ptr<> for the
> > timer+callbackState combo, and pass a weak_ptr<> to the timer. Then:
> >
> > - at the beginning of the timer, you upgrade the weak_ptr with lock()
> > and if it fails, return
> >
> > I'm not sure how you'd pass the weak_ptr/shared_ptr to a callback
>
> I suspect this is not possible in plain C++ without modifying QEMU or
> code generating at runtime.
>
> I would go with your aio_wait_bh_oneshot suggestion. Please consider
> adding it to QEMU as I pointed above. I can send a patch.
>

Hey - this is an important race condition to resolve, can we get some
attention back on this patchset please.  It's pressing.

Patrick
Paolo Bonzini Aug. 14, 2024, 10:10 p.m. UTC | #6
On 8/14/24 23:12, Patrick Leis wrote:
>     I would go with your aio_wait_bh_oneshot suggestion. Please consider
>     adding it to QEMU as I pointed above. I can send a patch.
> 
> Hey - this is an important race condition to resolve, can we get some 
> attention back on this patchset please.  It's pressing.

Sorry about the delay.

The patch is good, and there is no reason why it should break.  But 
without a user (and as far as I understand there is no occurrence of 
this race condition in upstream QEMU code) there is no reason to include 
it in QEMU.  QEMU is not a library, and therefore there should be no 
unused functions in the code base.

You can keep it as a solution for the race condition in your code; 
depending on how your fork is organized, it may be kept in qemu-timer.c 
or in a file that is part of your additional device models.

Thanks,

Paolo
diff mbox series

Patch

diff --git a/include/qemu/timer.h b/include/qemu/timer.h
index 5ce83c7911..c2c98f79f4 100644
--- a/include/qemu/timer.h
+++ b/include/qemu/timer.h
@@ -89,6 +89,7 @@  struct QEMUTimer {
     QEMUTimer *next;
     int attributes;
     int scale;
+    bool cb_running;
 };
 
 extern QEMUTimerListGroup main_loop_tlg;
diff --git a/util/qemu-timer.c b/util/qemu-timer.c
index 213114be68..5ec379dc43 100644
--- a/util/qemu-timer.c
+++ b/util/qemu-timer.c
@@ -370,6 +370,7 @@  void timer_init_full(QEMUTimer *ts,
     ts->scale = scale;
     ts->attributes = attributes;
     ts->expire_time = -1;
+    ts->cb_running = false;
 }
 
 void timer_deinit(QEMUTimer *ts)
@@ -435,6 +436,10 @@  void timer_del(QEMUTimer *ts)
         qemu_mutex_lock(&timer_list->active_timers_lock);
         timer_del_locked(timer_list, ts);
         qemu_mutex_unlock(&timer_list->active_timers_lock);
+
+        if (qatomic_read(&ts->cb_running)) {
+            qemu_event_wait(&timer_list->timers_done_ev);
+        }
     }
 }
 
@@ -571,9 +576,15 @@  bool timerlist_run_timers(QEMUTimerList *timer_list)
         cb = ts->cb;
         opaque = ts->opaque;
 
+        /* prevent timer_del from returning while cb(opaque)
+         * is still running (by waiting for timers_done_ev).
+         */
+        qatomic_set(&ts->cb_running, true);
+
         /* run the callback (the timer list can be modified) */
         qemu_mutex_unlock(&timer_list->active_timers_lock);
         cb(opaque);
+        qatomic_set(&ts->cb_running, false);
         qemu_mutex_lock(&timer_list->active_timers_lock);
 
         progress = true;