Message ID | 20170727190353.3353-1-gustavo@padovan.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Quoting Gustavo Padovan (2017-07-27 20:03:53) > From: Gustavo Padovan <gustavo.padovan@collabora.com> > > If userspace already dropped its own reference by closing the sw_sync > fence fd we might end up in a deadlock where > dma_fence_is_signaled_locked() will trigger the release of the fence a > thus try to hold the lock to remove the fence from the list. So the issue here is that call to dma_fence_is_signaled_lock() is triggering the unreference? > We need to grab a reference to the fence before calling into this chain if > we want to avoid this issue. > > Cc: Chris Wilson <chris@chris-wilson.co.uk> > Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.com> > --- > drivers/dma-buf/sw_sync.c | 7 ++++++- > 1 file changed, 6 insertions(+), 1 deletion(-) > > diff --git a/drivers/dma-buf/sw_sync.c b/drivers/dma-buf/sw_sync.c > index af1bc84..8291434 100644 > --- a/drivers/dma-buf/sw_sync.c > +++ b/drivers/dma-buf/sw_sync.c > @@ -144,11 +144,16 @@ static void sync_timeline_signal(struct sync_timeline *obj, unsigned int inc) > obj->value += inc; > > list_for_each_entry_safe(pt, next, &obj->pt_list, link) { > - if (!dma_fence_is_signaled_locked(&pt->base)) > + dma_fence_get(&pt->base); This would need to be dma_fence_get_rcu() to avoid grabbing the fence when its refcount has hit 0. > + if (!dma_fence_is_signaled_locked(&pt->base)) { > + dma_fence_put(&pt->base); > break; > + } > > list_del_init(&pt->link); > rb_erase(&pt->node, &obj->pt_tree); But if I understand correctly, we just need to unlink first, then signal. list_for_each_entry_safe() { if (!timeline_fence_signaled(&pt->base)) break; list_del_init(&pt->link); rb_erase(&pt->node, &obj->pt_tree); dma_fence_signal_locked(&pt->base); } The challenge is in writing the comment to explain the open-coding. -Chris
2017-07-27 Chris Wilson <chris@chris-wilson.co.uk>: > Quoting Gustavo Padovan (2017-07-27 20:03:53) > > From: Gustavo Padovan <gustavo.padovan@collabora.com> > > > > If userspace already dropped its own reference by closing the sw_sync > > fence fd we might end up in a deadlock where > > dma_fence_is_signaled_locked() will trigger the release of the fence a > > thus try to hold the lock to remove the fence from the list. > > So the issue here is that call to dma_fence_is_signaled_lock() is > triggering the unreference? Exactly. I'll say that explicitely in the commit message. > > > We need to grab a reference to the fence before calling into this chain if > > we want to avoid this issue. > > > > Cc: Chris Wilson <chris@chris-wilson.co.uk> > > Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.com> > > --- > > drivers/dma-buf/sw_sync.c | 7 ++++++- > > 1 file changed, 6 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/dma-buf/sw_sync.c b/drivers/dma-buf/sw_sync.c > > index af1bc84..8291434 100644 > > --- a/drivers/dma-buf/sw_sync.c > > +++ b/drivers/dma-buf/sw_sync.c > > @@ -144,11 +144,16 @@ static void sync_timeline_signal(struct sync_timeline *obj, unsigned int inc) > > obj->value += inc; > > > > list_for_each_entry_safe(pt, next, &obj->pt_list, link) { > > - if (!dma_fence_is_signaled_locked(&pt->base)) > > + dma_fence_get(&pt->base); > > This would need to be dma_fence_get_rcu() to avoid grabbing the fence > when its refcount has hit 0. > > > + if (!dma_fence_is_signaled_locked(&pt->base)) { > > + dma_fence_put(&pt->base); > > break; > > + } > > > > list_del_init(&pt->link); > > rb_erase(&pt->node, &obj->pt_tree); > > But if I understand correctly, we just need to unlink first, then > signal. > > list_for_each_entry_safe() { > if (!timeline_fence_signaled(&pt->base)) > break; > > list_del_init(&pt->link); > rb_erase(&pt->node, &obj->pt_tree); > > dma_fence_signal_locked(&pt->base); > } > > The challenge is in writing the comment to explain the open-coding. That is cleaner and doesn't need the get/put dance. I'll come up with a comment to explain it. Gustavo
On Thu, Jul 27, 2017 at 04:03:53PM -0300, Gustavo Padovan wrote: > From: Gustavo Padovan <gustavo.padovan@collabora.com> > > If userspace already dropped its own reference by closing the sw_sync > fence fd we might end up in a deadlock where > dma_fence_is_signaled_locked() will trigger the release of the fence a > thus try to hold the lock to remove the fence from the list. > > We need to grab a reference to the fence before calling into this chain if > we want to avoid this issue. > > Cc: Chris Wilson <chris@chris-wilson.co.uk> > Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.com> Do we have a testcase for this? -Daniel > --- > drivers/dma-buf/sw_sync.c | 7 ++++++- > 1 file changed, 6 insertions(+), 1 deletion(-) > > diff --git a/drivers/dma-buf/sw_sync.c b/drivers/dma-buf/sw_sync.c > index af1bc84..8291434 100644 > --- a/drivers/dma-buf/sw_sync.c > +++ b/drivers/dma-buf/sw_sync.c > @@ -144,11 +144,16 @@ static void sync_timeline_signal(struct sync_timeline *obj, unsigned int inc) > obj->value += inc; > > list_for_each_entry_safe(pt, next, &obj->pt_list, link) { > - if (!dma_fence_is_signaled_locked(&pt->base)) > + dma_fence_get(&pt->base); > + if (!dma_fence_is_signaled_locked(&pt->base)) { > + dma_fence_put(&pt->base); > break; > + } > > list_del_init(&pt->link); > rb_erase(&pt->node, &obj->pt_tree); > + > + dma_fence_put(&pt->base); > } > > spin_unlock_irq(&obj->lock); > -- > 2.9.4 > > _______________________________________________ > dri-devel mailing list > dri-devel@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/dri-devel
Quoting Gustavo Padovan (2017-07-28 02:57:25) > 2017-07-27 Chris Wilson <chris@chris-wilson.co.uk>: > > > Quoting Gustavo Padovan (2017-07-27 20:03:53) > > > From: Gustavo Padovan <gustavo.padovan@collabora.com> > > > > > > If userspace already dropped its own reference by closing the sw_sync > > > fence fd we might end up in a deadlock where > > > dma_fence_is_signaled_locked() will trigger the release of the fence a > > > thus try to hold the lock to remove the fence from the list. > > > > So the issue here is that call to dma_fence_is_signaled_lock() is > > triggering the unreference? > > Exactly. I'll say that explicitely in the commit message. :) It was more of a rhetorical question making sure that I understood correctly. > > But if I understand correctly, we just need to unlink first, then > > signal. > > > > list_for_each_entry_safe() { > > if (!timeline_fence_signaled(&pt->base)) > > break; > > > > list_del_init(&pt->link); > > rb_erase(&pt->node, &obj->pt_tree); > > > > dma_fence_signal_locked(&pt->base); > > } > > > > The challenge is in writing the comment to explain the open-coding. > > That is cleaner and doesn't need the get/put dance. I'll come up with a > comment to explain it. ... /* * A signal callback may release the last reference to this fence, * causing it to be freed. That operation has to be last to avoid * a use after free inside this loop, and must be after we remove * the fence from the timeline in order to prevent deadlocking on * timeline->lock inside timeline_fence_release(). */ dma_fence_signal_locked(). -Chris
diff --git a/drivers/dma-buf/sw_sync.c b/drivers/dma-buf/sw_sync.c index af1bc84..8291434 100644 --- a/drivers/dma-buf/sw_sync.c +++ b/drivers/dma-buf/sw_sync.c @@ -144,11 +144,16 @@ static void sync_timeline_signal(struct sync_timeline *obj, unsigned int inc) obj->value += inc; list_for_each_entry_safe(pt, next, &obj->pt_list, link) { - if (!dma_fence_is_signaled_locked(&pt->base)) + dma_fence_get(&pt->base); + if (!dma_fence_is_signaled_locked(&pt->base)) { + dma_fence_put(&pt->base); break; + } list_del_init(&pt->link); rb_erase(&pt->node, &obj->pt_tree); + + dma_fence_put(&pt->base); } spin_unlock_irq(&obj->lock);