Message ID | 20180503162626.27753-1-jack@suse.cz (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Thu, May 03, 2018 at 06:26:26PM +0200, Jan Kara wrote: > Syzbot has reported that it can hit a NULL pointer dereference in > wb_workfn() due to wb->bdi->dev being NULL. This indicates that > wb_workfn() was called for an already unregistered bdi which should not > happen as wb_shutdown() called from bdi_unregister() should make sure > all pending writeback works are completed before bdi is unregistered. > Except that wb_workfn() itself can requeue the work with: > > mod_delayed_work(bdi_wq, &wb->dwork, 0); > > and if this happens while wb_shutdown() is waiting in: > > flush_delayed_work(&wb->dwork); > > the dwork can get executed after wb_shutdown() has finished and > bdi_unregister() has cleared wb->bdi->dev. > > Make wb_workfn() use wakeup_wb() for requeueing the work which takes all > the necessary precautions against racing with bdi unregistration. > > CC: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> > CC: Tejun Heo <tj@kernel.org> > Fixes: 839a8e8660b6777e7fe4e80af1a048aebe2b5977 > Reported-by: syzbot <syzbot+9873874c735f2892e7e9@syzkaller.appspotmail.com> > Signed-off-by: Jan Kara <jack@suse.cz> > --- > fs/fs-writeback.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c > index 47d7c151fcba..471d863958bc 100644 > --- a/fs/fs-writeback.c > +++ b/fs/fs-writeback.c > @@ -1961,7 +1961,7 @@ void wb_workfn(struct work_struct *work) > } > > if (!list_empty(&wb->work_list)) > - mod_delayed_work(bdi_wq, &wb->dwork, 0); > + wb_wakeup(wb); > else if (wb_has_dirty_io(wb) && dirty_writeback_interval) > wb_wakeup_delayed(wb); Yup, looks fine - I can't see any more of these open coded wakeup, either, so we should be good here. Reviewed-by: Dave Chinner <dchinner@redhat.com> As an aside, why is half the wb infrastructure in fs/fs-writeback.c and the other half in mm/backing-dev.c? it seems pretty random as to what is where e.g. wb_wakeup() and wb_wakeup_delayed() are almost identical, but are in completely different files... Cheers, Dave.
On 5/3/18 3:55 PM, Dave Chinner wrote: > On Thu, May 03, 2018 at 06:26:26PM +0200, Jan Kara wrote: >> Syzbot has reported that it can hit a NULL pointer dereference in >> wb_workfn() due to wb->bdi->dev being NULL. This indicates that >> wb_workfn() was called for an already unregistered bdi which should not >> happen as wb_shutdown() called from bdi_unregister() should make sure >> all pending writeback works are completed before bdi is unregistered. >> Except that wb_workfn() itself can requeue the work with: >> >> mod_delayed_work(bdi_wq, &wb->dwork, 0); >> >> and if this happens while wb_shutdown() is waiting in: >> >> flush_delayed_work(&wb->dwork); >> >> the dwork can get executed after wb_shutdown() has finished and >> bdi_unregister() has cleared wb->bdi->dev. >> >> Make wb_workfn() use wakeup_wb() for requeueing the work which takes all >> the necessary precautions against racing with bdi unregistration. >> >> CC: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> >> CC: Tejun Heo <tj@kernel.org> >> Fixes: 839a8e8660b6777e7fe4e80af1a048aebe2b5977 >> Reported-by: syzbot <syzbot+9873874c735f2892e7e9@syzkaller.appspotmail.com> >> Signed-off-by: Jan Kara <jack@suse.cz> >> --- >> fs/fs-writeback.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c >> index 47d7c151fcba..471d863958bc 100644 >> --- a/fs/fs-writeback.c >> +++ b/fs/fs-writeback.c >> @@ -1961,7 +1961,7 @@ void wb_workfn(struct work_struct *work) >> } >> >> if (!list_empty(&wb->work_list)) >> - mod_delayed_work(bdi_wq, &wb->dwork, 0); >> + wb_wakeup(wb); >> else if (wb_has_dirty_io(wb) && dirty_writeback_interval) >> wb_wakeup_delayed(wb); > > Yup, looks fine - I can't see any more of these open coded wakeup, > either, so we should be good here. > > Reviewed-by: Dave Chinner <dchinner@redhat.com> > > As an aside, why is half the wb infrastructure in fs/fs-writeback.c > and the other half in mm/backing-dev.c? it seems pretty random as to > what is where e.g. wb_wakeup() and wb_wakeup_delayed() are almost > identical, but are in completely different files... That's always bothered me too, it's due for a cleanup and bringing it all into one location.
Jan Kara wrote: > Make wb_workfn() use wakeup_wb() for requeueing the work which takes all > the necessary precautions against racing with bdi unregistration. Yes, this patch will solve NULL pointer dereference bug. But is it OK to leave list_empty(&wb->work_list) == false situation? Who takes over the role of making list_empty(&wb->work_list) == true? Just a confirmation, for Fabiano Rosas is facing a problem that "write call hangs in kernel space after virtio hot-remove" and is thinking that we might need to go the opposite direction ( http://lkml.kernel.org/r/f0787b79-1e50-5f55-a400-44f715451777@linux.ibm.com ).
On Fri 04-05-18 07:35:34, Tetsuo Handa wrote: > Jan Kara wrote: > > Make wb_workfn() use wakeup_wb() for requeueing the work which takes all > > the necessary precautions against racing with bdi unregistration. > > Yes, this patch will solve NULL pointer dereference bug. But is it OK to > leave list_empty(&wb->work_list) == false situation? Who takes over the > role of making list_empty(&wb->work_list) == true? That's a good question. The reason is the last running instance of wb_workfn() cannot leave with the work_list non-empty. Once WB_registered is cleared we cannot add new entries to work_list. Then we'll queue and flush last wb_workfn() to clean up the list. The problem with NULL ptr deref has been triggered not by this last running wb_workfn() but by one running independently in parallel to wb_shutdown(). So something like: CPU0 CPU1 CPU2 wb_workfn() do { ... } while (!list_empty(&wb->work_list)); wb_queue_work() if (test_bit(WB_registered, &wb->state)) { list_add_tail(&work->list, &wb->work_list); mod_delayed_work(bdi_wq, &wb->dwork, 0); } wb_shutdown() if (!test_and_clear_bit(WB_registered, &wb->state)) { ... mod_delayed_work(bdi_wq, &wb->dwork, 0); flush_delayed_work(&wb->dwork); if (!list_empty(&wb->work_list)) mod_delayed_work(bdi_wq, &wb->dwork, 0); -> queues buggy work > Just a confirmation, for Fabiano Rosas is facing a problem that "write call > hangs in kernel space after virtio hot-remove" and is thinking that we might > need to go the opposite direction > ( http://lkml.kernel.org/r/f0787b79-1e50-5f55-a400-44f715451777@linux.ibm.com ). Yes, I'm aware of that report and I think it should be solved differently than what Fabiano suggests. Honza
On Fri 04-05-18 07:55:58, Dave Chinner wrote: > On Thu, May 03, 2018 at 06:26:26PM +0200, Jan Kara wrote: > > Syzbot has reported that it can hit a NULL pointer dereference in > > wb_workfn() due to wb->bdi->dev being NULL. This indicates that > > wb_workfn() was called for an already unregistered bdi which should not > > happen as wb_shutdown() called from bdi_unregister() should make sure > > all pending writeback works are completed before bdi is unregistered. > > Except that wb_workfn() itself can requeue the work with: > > > > mod_delayed_work(bdi_wq, &wb->dwork, 0); > > > > and if this happens while wb_shutdown() is waiting in: > > > > flush_delayed_work(&wb->dwork); > > > > the dwork can get executed after wb_shutdown() has finished and > > bdi_unregister() has cleared wb->bdi->dev. > > > > Make wb_workfn() use wakeup_wb() for requeueing the work which takes all > > the necessary precautions against racing with bdi unregistration. > > > > CC: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> > > CC: Tejun Heo <tj@kernel.org> > > Fixes: 839a8e8660b6777e7fe4e80af1a048aebe2b5977 > > Reported-by: syzbot <syzbot+9873874c735f2892e7e9@syzkaller.appspotmail.com> > > Signed-off-by: Jan Kara <jack@suse.cz> > > --- > > fs/fs-writeback.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c > > index 47d7c151fcba..471d863958bc 100644 > > --- a/fs/fs-writeback.c > > +++ b/fs/fs-writeback.c > > @@ -1961,7 +1961,7 @@ void wb_workfn(struct work_struct *work) > > } > > > > if (!list_empty(&wb->work_list)) > > - mod_delayed_work(bdi_wq, &wb->dwork, 0); > > + wb_wakeup(wb); > > else if (wb_has_dirty_io(wb) && dirty_writeback_interval) > > wb_wakeup_delayed(wb); > > Yup, looks fine - I can't see any more of these open coded wakeup, > either, so we should be good here. > > Reviewed-by: Dave Chinner <dchinner@redhat.com> Thanks! > As an aside, why is half the wb infrastructure in fs/fs-writeback.c > and the other half in mm/backing-dev.c? it seems pretty random as to > what is where e.g. wb_wakeup() and wb_wakeup_delayed() are almost > identical, but are in completely different files... Yeah, it deserves a cleanup. Honza
On Thu 03-05-18 18:26:26, Jan Kara wrote: > Syzbot has reported that it can hit a NULL pointer dereference in > wb_workfn() due to wb->bdi->dev being NULL. This indicates that > wb_workfn() was called for an already unregistered bdi which should not > happen as wb_shutdown() called from bdi_unregister() should make sure > all pending writeback works are completed before bdi is unregistered. > Except that wb_workfn() itself can requeue the work with: > > mod_delayed_work(bdi_wq, &wb->dwork, 0); > > and if this happens while wb_shutdown() is waiting in: > > flush_delayed_work(&wb->dwork); > > the dwork can get executed after wb_shutdown() has finished and > bdi_unregister() has cleared wb->bdi->dev. > > Make wb_workfn() use wakeup_wb() for requeueing the work which takes all > the necessary precautions against racing with bdi unregistration. > > CC: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> > CC: Tejun Heo <tj@kernel.org> > Fixes: 839a8e8660b6777e7fe4e80af1a048aebe2b5977 > Reported-by: syzbot <syzbot+9873874c735f2892e7e9@syzkaller.appspotmail.com> > Signed-off-by: Jan Kara <jack@suse.cz> > --- > fs/fs-writeback.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) Jens, can you please pick up this patch? Probably for the next merge window (I don't see a reason to rush this at this point in release cycle). Thanks! Honza > > diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c > index 47d7c151fcba..471d863958bc 100644 > --- a/fs/fs-writeback.c > +++ b/fs/fs-writeback.c > @@ -1961,7 +1961,7 @@ void wb_workfn(struct work_struct *work) > } > > if (!list_empty(&wb->work_list)) > - mod_delayed_work(bdi_wq, &wb->dwork, 0); > + wb_wakeup(wb); > else if (wb_has_dirty_io(wb) && dirty_writeback_interval) > wb_wakeup_delayed(wb); > > -- > 2.13.6 >
On 5/9/18 4:31 AM, Jan Kara wrote: > On Thu 03-05-18 18:26:26, Jan Kara wrote: >> Syzbot has reported that it can hit a NULL pointer dereference in >> wb_workfn() due to wb->bdi->dev being NULL. This indicates that >> wb_workfn() was called for an already unregistered bdi which should not >> happen as wb_shutdown() called from bdi_unregister() should make sure >> all pending writeback works are completed before bdi is unregistered. >> Except that wb_workfn() itself can requeue the work with: >> >> mod_delayed_work(bdi_wq, &wb->dwork, 0); >> >> and if this happens while wb_shutdown() is waiting in: >> >> flush_delayed_work(&wb->dwork); >> >> the dwork can get executed after wb_shutdown() has finished and >> bdi_unregister() has cleared wb->bdi->dev. >> >> Make wb_workfn() use wakeup_wb() for requeueing the work which takes all >> the necessary precautions against racing with bdi unregistration. >> >> CC: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> >> CC: Tejun Heo <tj@kernel.org> >> Fixes: 839a8e8660b6777e7fe4e80af1a048aebe2b5977 >> Reported-by: syzbot <syzbot+9873874c735f2892e7e9@syzkaller.appspotmail.com> >> Signed-off-by: Jan Kara <jack@suse.cz> >> --- >> fs/fs-writeback.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) > > Jens, can you please pick up this patch? Probably for the next merge window > (I don't see a reason to rush this at this point in release cycle). Thanks! Looks like I never replied that back, but I did pick it up, and it did in fact go out last week for this series. So we should be all good. I didn't see a need to postpone it, it's obviously correct and fixes a real issue.
diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c index 47d7c151fcba..471d863958bc 100644 --- a/fs/fs-writeback.c +++ b/fs/fs-writeback.c @@ -1961,7 +1961,7 @@ void wb_workfn(struct work_struct *work) } if (!list_empty(&wb->work_list)) - mod_delayed_work(bdi_wq, &wb->dwork, 0); + wb_wakeup(wb); else if (wb_has_dirty_io(wb) && dirty_writeback_interval) wb_wakeup_delayed(wb);
Syzbot has reported that it can hit a NULL pointer dereference in wb_workfn() due to wb->bdi->dev being NULL. This indicates that wb_workfn() was called for an already unregistered bdi which should not happen as wb_shutdown() called from bdi_unregister() should make sure all pending writeback works are completed before bdi is unregistered. Except that wb_workfn() itself can requeue the work with: mod_delayed_work(bdi_wq, &wb->dwork, 0); and if this happens while wb_shutdown() is waiting in: flush_delayed_work(&wb->dwork); the dwork can get executed after wb_shutdown() has finished and bdi_unregister() has cleared wb->bdi->dev. Make wb_workfn() use wakeup_wb() for requeueing the work which takes all the necessary precautions against racing with bdi unregistration. CC: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> CC: Tejun Heo <tj@kernel.org> Fixes: 839a8e8660b6777e7fe4e80af1a048aebe2b5977 Reported-by: syzbot <syzbot+9873874c735f2892e7e9@syzkaller.appspotmail.com> Signed-off-by: Jan Kara <jack@suse.cz> --- fs/fs-writeback.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)