Message ID | 201805192327.JIF05779.OQFJFStOOMLFVH@I-love.SAKURA.ne.jp (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Sat 19-05-18 23:27:09, Tetsuo Handa wrote: > Tetsuo Handa wrote: > > Jan Kara wrote: > > > Make wb_workfn() use wakeup_wb() for requeueing the work which takes all > > > the necessary precautions against racing with bdi unregistration. > > > > Yes, this patch will solve NULL pointer dereference bug. But is it OK to leave > > list_empty(&wb->work_list) == false situation? Who takes over the role of making > > list_empty(&wb->work_list) == true? > > syzbot is again reporting the same NULL pointer dereference. > > general protection fault in wb_workfn (2) > https://syzkaller.appspot.com/bug?id=e0818ccb7e46190b3f1038b0c794299208ed4206 Gaah... So we are still missing something. > Didn't we overlook something obvious in commit b8b784958eccbf8f ("bdi: > Fix oops in wb_workfn()") ? > > At first, I thought that that commit will solve NULL pointer dereference bug. > But what does > > if (!list_empty(&wb->work_list)) > - mod_delayed_work(bdi_wq, &wb->dwork, 0); > + wb_wakeup(wb); > else if (wb_has_dirty_io(wb) && dirty_writeback_interval) > wb_wakeup_delayed(wb); > > mean? > > static void wb_wakeup(struct bdi_writeback *wb) > { > spin_lock_bh(&wb->work_lock); > if (test_bit(WB_registered, &wb->state)) > mod_delayed_work(bdi_wq, &wb->dwork, 0); > spin_unlock_bh(&wb->work_lock); > } > > It means nothing but "we don't call mod_delayed_work() if WB_registered > bit was already cleared". Exactly. > But if WB_registered bit is not yet cleared when we hit > wb_wakeup_delayed() path? > > void wb_wakeup_delayed(struct bdi_writeback *wb) > { > unsigned long timeout; > > timeout = msecs_to_jiffies(dirty_writeback_interval * 10); > spin_lock_bh(&wb->work_lock); > if (test_bit(WB_registered, &wb->state)) > queue_delayed_work(bdi_wq, &wb->dwork, timeout); > spin_unlock_bh(&wb->work_lock); > } > > add_timer() is called because (presumably) timeout > 0. And after that > timeout expires, __queue_work() is called even if WB_registered bit is > already cleared before that timeout expires, isn't it? Yes. > void delayed_work_timer_fn(struct timer_list *t) > { > struct delayed_work *dwork = from_timer(dwork, t, timer); > > /* should have been called from irqsafe timer with irq already off */ > __queue_work(dwork->cpu, dwork->wq, &dwork->work); > } > > Then, wb_workfn() is after all scheduled even if we check for > WB_registered bit, isn't it? It can be queued after WB_registered bit is cleared but it cannot be queued after mod_delayed_work(bdi_wq, &wb->dwork, 0) has finished. That function deletes the pending timer (the timer cannot be armed again because WB_registered is cleared) and queues what should be the last round of wb_workfn(). > Then, don't we need to check that > > mod_delayed_work(bdi_wq, &wb->dwork, 0); > flush_delayed_work(&wb->dwork); > > is really waiting for completion? At least, shouldn't we try below debug > output (not only for debugging this report but also generally desirable)? > > diff --git a/mm/backing-dev.c b/mm/backing-dev.c > index 7441bd9..ccec8cd 100644 > --- a/mm/backing-dev.c > +++ b/mm/backing-dev.c > @@ -376,8 +376,10 @@ static void wb_shutdown(struct bdi_writeback *wb) > * tells wb_workfn() that @wb is dying and its work_list needs to > * be drained no matter what. > */ > - mod_delayed_work(bdi_wq, &wb->dwork, 0); > - flush_delayed_work(&wb->dwork); > + if (!mod_delayed_work(bdi_wq, &wb->dwork, 0)) > + printk(KERN_WARNING "wb_shutdown: mod_delayed_work() failed\n"); false return from mod_delayed_work() just means that there was no timer armed. That is a valid situation if there are no dirty data. > + if (!flush_delayed_work(&wb->dwork)) > + printk(KERN_WARNING "wb_shutdown: flush_delayed_work() failed\n"); And this is valid as well (although unlikely) if the work managed to complete on another CPU before flush_delayed_work() was called. So I don't think your warnings will help us much. But yes, we need to debug this somehow. For now I have no idea what could be still going wrong. Honza
Jan Kara wrote: > > void delayed_work_timer_fn(struct timer_list *t) > > { > > struct delayed_work *dwork = from_timer(dwork, t, timer); > > > > /* should have been called from irqsafe timer with irq already off */ > > __queue_work(dwork->cpu, dwork->wq, &dwork->work); > > } > > > > Then, wb_workfn() is after all scheduled even if we check for > > WB_registered bit, isn't it? > > It can be queued after WB_registered bit is cleared but it cannot be queued > after mod_delayed_work(bdi_wq, &wb->dwork, 0) has finished. That function > deletes the pending timer (the timer cannot be armed again because > WB_registered is cleared) and queues what should be the last round of > wb_workfn(). mod_delayed_work() deletes the pending timer but does not wait for already invoked timer handler to complete because it is using del_timer() rather than del_timer_sync(). Then, what happens if __queue_work() is almost concurrently executed from two CPUs, one from mod_delayed_work(bdi_wq, &wb->dwork, 0) from wb_shutdown() path (which is called without spin_lock_bh(&wb->work_lock)) and the other from delayed_work_timer_fn() path (which is called without checking WB_registered bit under spin_lock_bh(&wb->work_lock)) ? wb_wakeup_delayed() { spin_lock_bh(&wb->work_lock); if (test_bit(WB_registered, &wb->state)) // succeeds queue_delayed_work(bdi_wq, &wb->d_work, timeout) { queue_delayed_work_on(WORK_CPU_UNBOUND, bdi_wq, &wb->d_work, timeout) { if (!test_and_set_bit(WORK_STRUCT_PENDING_BIT, work_data_bits(&wb->d_work.work))) { // succeeds __queue_delayed_work(WORK_CPU_UNBOUND, bdi_wq, &wb->d_work, timeout) { add_timer(timer); // schedules for delayed_work_timer_fn() } } } } spin_unlock_bh(&wb->work_lock); } delayed_work_timer_fn() { // del_timer() already returns false at this point because this timer // is already inside handler. But something took long here enough to // wait for __queue_work() from wb_shutdown() path to finish? __queue_work(WORK_CPU_UNBOUND, bdi_wq, &wb->d_work.work) { insert_work(pwq, work, worklist, work_flags); } } wb_shutdown() { mod_delayed_work(bdi_wq, &wb->dwork, 0) { mod_delayed_work_on(WORK_CPU_UNBOUND, bdi_wq, &wb->dwork, 0) { ret = try_to_grab_pending(&wb->dwork.work, true, &flags) { if (likely(del_timer(&wb->dwork.timer))) // fails because already in delayed_work_timer_fn() return 1; if (!test_and_set_bit(WORK_STRUCT_PENDING_BIT, work_data_bits(&wb->dwork.work))) // fails because already set by queue_delayed_work() return 0; // Returns 1 or -ENOENT after doing something? } if (ret >= 0) __queue_delayed_work(WORK_CPU_UNBOUND, bdi_wq, &wb->dwork, 0) { __queue_work(WORK_CPU_UNBOUND, bdi_wq, &wb->dwork.work) { insert_work(pwq, work, worklist, work_flags); } } } } }
diff --git a/mm/backing-dev.c b/mm/backing-dev.c index 7441bd9..ccec8cd 100644 --- a/mm/backing-dev.c +++ b/mm/backing-dev.c @@ -376,8 +376,10 @@ static void wb_shutdown(struct bdi_writeback *wb) * tells wb_workfn() that @wb is dying and its work_list needs to * be drained no matter what. */ - mod_delayed_work(bdi_wq, &wb->dwork, 0); - flush_delayed_work(&wb->dwork); + if (!mod_delayed_work(bdi_wq, &wb->dwork, 0)) + printk(KERN_WARNING "wb_shutdown: mod_delayed_work() failed\n"); + if (!flush_delayed_work(&wb->dwork)) + printk(KERN_WARNING "wb_shutdown: flush_delayed_work() failed\n"); WARN_ON(!list_empty(&wb->work_list)); /* * Make sure bit gets cleared after shutdown is finished. Matches with