Message ID | 20180611160131.GQ1351649@devbig577.frc2.facebook.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Mon 11-06-18 09:01:31, Tejun Heo wrote: > Hello, > > On Mon, Jun 11, 2018 at 11:12:48AM +0200, Jan Kara wrote: > > However this is wrong and so is the patch. The problem is in > > cgwb_bdi_unregister() which does cgwb_kill() and thus drops bdi's > > reference to wb structures before going through the list of wbs again and > > calling wb_shutdown() on each of them. The writeback structures we are > > accessing at this point can be already freed in principle like: > > > > CPU1 CPU2 > > cgwb_bdi_unregister() > > cgwb_kill(*slot); > > > > cgwb_release() > > queue_work(cgwb_release_wq, &wb->release_work); > > cgwb_release_workfn() > > wb = list_first_entry(&bdi->wb_list, ...) > > spin_unlock_irq(&cgwb_lock); > > wb_shutdown(wb); > > ... > > kfree_rcu(wb, rcu); > > wb_shutdown(wb); -> oops use-after-free > > > > I'm not 100% sure how to fix this. wb structures can be at various phases of > > shutdown (or there may be other external references still existing) when we > > enter cgwb_bdi_unregister() so I think adding a way for cgwb_bdi_unregister() > > to wait for standard wb shutdown path to finish is the most robust way. > > What do you think about attached patch Tejun? So far only compile tested... > > > > Possible problem with it is that now cgwb_bdi_unregister() will wait for > > all wb references to be dropped so it adds some implicit dependencies to > > bdi shutdown path. > > Would something like the following work or am I missing the point > entirely? I was pondering the same solution for a while but I think it won't work. The problem is that e.g. wb_memcg_offline() could have already removed wb from the radix tree but it is still pending in bdi->wb_list (wb_shutdown() has not run yet) and so we'd drop reference we didn't get. Honza > diff --git a/mm/backing-dev.c b/mm/backing-dev.c > index 347cc83..359cacd 100644 > --- a/mm/backing-dev.c > +++ b/mm/backing-dev.c > @@ -715,14 +715,19 @@ static void cgwb_bdi_unregister(struct backing_dev_info *bdi) > WARN_ON(test_bit(WB_registered, &bdi->wb.state)); > > spin_lock_irq(&cgwb_lock); > - radix_tree_for_each_slot(slot, &bdi->cgwb_tree, &iter, 0) > - cgwb_kill(*slot); > + radix_tree_for_each_slot(slot, &bdi->cgwb_tree, &iter, 0) { > + struct bdi_writeback *wb = *slot; > + > + wb_get(wb); > + cgwb_kill(wb); > + } > > while (!list_empty(&bdi->wb_list)) { > wb = list_first_entry(&bdi->wb_list, struct bdi_writeback, > bdi_node); > spin_unlock_irq(&cgwb_lock); > wb_shutdown(wb); > + wb_put(wb); > spin_lock_irq(&cgwb_lock); > } > spin_unlock_irq(&cgwb_lock); > > > -- > tejun
Hello, On Mon, Jun 11, 2018 at 06:29:20PM +0200, Jan Kara wrote: > > Would something like the following work or am I missing the point > > entirely? > > I was pondering the same solution for a while but I think it won't work. > The problem is that e.g. wb_memcg_offline() could have already removed > wb from the radix tree but it is still pending in bdi->wb_list > (wb_shutdown() has not run yet) and so we'd drop reference we didn't get. Yeah, right, so the root cause is that we're walking the wb_list while holding lock and expecting the object to stay there even after lock is released. Hmm... we can use a mutex to synchronize the two destruction paths. It's not like they're hot paths anyway. Thanks.
On Mon 11-06-18 10:20:53, Tejun Heo wrote: > Hello, > > On Mon, Jun 11, 2018 at 06:29:20PM +0200, Jan Kara wrote: > > > Would something like the following work or am I missing the point > > > entirely? > > > > I was pondering the same solution for a while but I think it won't work. > > The problem is that e.g. wb_memcg_offline() could have already removed > > wb from the radix tree but it is still pending in bdi->wb_list > > (wb_shutdown() has not run yet) and so we'd drop reference we didn't get. > > Yeah, right, so the root cause is that we're walking the wb_list while > holding lock and expecting the object to stay there even after lock is > released. Hmm... we can use a mutex to synchronize the two > destruction paths. It's not like they're hot paths anyway. Hmm, do you mean like having a per-bdi or even a global mutex that would protect whole wb_shutdown()? Yes, that should work and we could get rid of WB_shutting_down bit as well with that. Just it seems a bit strange to introduce a mutex only to synchronize these two shutdown paths - usually locks protect data structures and in this case we have cgwb_lock for that so it looks like a duplication from a first look. Honza
Hello, Jan. On Tue, Jun 12, 2018 at 05:57:54PM +0200, Jan Kara wrote: > > Yeah, right, so the root cause is that we're walking the wb_list while > > holding lock and expecting the object to stay there even after lock is > > released. Hmm... we can use a mutex to synchronize the two > > destruction paths. It's not like they're hot paths anyway. > > Hmm, do you mean like having a per-bdi or even a global mutex that would > protect whole wb_shutdown()? Yes, that should work and we could get rid of > WB_shutting_down bit as well with that. Just it seems a bit strange to Yeap. > introduce a mutex only to synchronize these two shutdown paths - usually > locks protect data structures and in this case we have cgwb_lock for > that so it looks like a duplication from a first look. Yeah, I feel a bit reluctant too but I think that's the right thing to do here. This is an inherently weird case where there are two ways that an object can go away with the immediate drain requirement from one side. It's not a hot path and the dumber the synchronization the better, right? Thanks.
diff --git a/mm/backing-dev.c b/mm/backing-dev.c index 347cc83..359cacd 100644 --- a/mm/backing-dev.c +++ b/mm/backing-dev.c @@ -715,14 +715,19 @@ static void cgwb_bdi_unregister(struct backing_dev_info *bdi) WARN_ON(test_bit(WB_registered, &bdi->wb.state)); spin_lock_irq(&cgwb_lock); - radix_tree_for_each_slot(slot, &bdi->cgwb_tree, &iter, 0) - cgwb_kill(*slot); + radix_tree_for_each_slot(slot, &bdi->cgwb_tree, &iter, 0) { + struct bdi_writeback *wb = *slot; + + wb_get(wb); + cgwb_kill(wb); + } while (!list_empty(&bdi->wb_list)) { wb = list_first_entry(&bdi->wb_list, struct bdi_writeback, bdi_node); spin_unlock_irq(&cgwb_lock); wb_shutdown(wb); + wb_put(wb); spin_lock_irq(&cgwb_lock); } spin_unlock_irq(&cgwb_lock);