diff mbox

dma-fence: add comment for WARN_ON in dma_fence_release()

Message ID 20180129154002.3287-1-oded.gabbay@gmail.com (mailing list archive)
State New, archived
Headers show

Commit Message

Oded Gabbay Jan. 29, 2018, 3:40 p.m. UTC
In dma_fence_release() there is a WARN_ON which could be triggered by
several cases of wrong dma-fence usage. This patch adds a comment to
explain two use-cases to help driver developers that use dma-fence
and trigger that WARN_ON to better understand the reasons for it.

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
---
 drivers/dma-buf/dma-fence.c | 33 +++++++++++++++++++++++++++++++++
 1 file changed, 33 insertions(+)

Comments

Daniel Vetter Jan. 30, 2018, 10:33 a.m. UTC | #1
On Mon, Jan 29, 2018 at 05:40:02PM +0200, Oded Gabbay wrote:
> In dma_fence_release() there is a WARN_ON which could be triggered by
> several cases of wrong dma-fence usage. This patch adds a comment to
> explain two use-cases to help driver developers that use dma-fence
> and trigger that WARN_ON to better understand the reasons for it.
> 
> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>

Not sure how useful this is, trying to do anything with a dma_fence while
not holding a reference is just plain buggy. If you do that, then all
kinds of use-after-free hilarity can happen. Trying to enumerate all the
ways you can get refcounting wrong seems futile.

What we maybe could do is a simple one-line like

	/* Failed to signal before release, could be a refcounting issue. */

I think this is generally a true statement and more useful hint for the
next convoluted scenario (which in all likelihood will be different from
yours).
-Daniel

> ---
>  drivers/dma-buf/dma-fence.c | 33 +++++++++++++++++++++++++++++++++
>  1 file changed, 33 insertions(+)
> 
> diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
> index 5d101c4053e0..a7170ab23ec0 100644
> --- a/drivers/dma-buf/dma-fence.c
> +++ b/drivers/dma-buf/dma-fence.c
> @@ -171,6 +171,39 @@ void dma_fence_release(struct kref *kref)
>  
>  	trace_dma_fence_destroy(fence);
>  
> +	/*
> +	 * If the WARN_ON below is triggered it could be because the dma fence
> +	 * was not signaled and therefore, the cb list is still not empty
> +	 * because the cb functions were not called.
> +	 *
> +	 * A more subtle case is where the fence got signaled by a thread that
> +	 * didn't hold a ref to the fence. The following describes the scenario:
> +	 *
> +	 *      Thread A                            Thread B
> +	 *--------------------------        --------------------------
> +	 * calls dma_fence_signal() {
> +	 *      set signal bit
> +	 *
> +	 *            scheduled out
> +	 *      ---------------------------> calls dma_fence_wait_timeout() and
> +	 *                                   returns immediately
> +	 *
> +	 *                                   calls dma_fence_put()
> +	 *                                         |
> +	 *                                         |thread A doesn't hold ref
> +	 *                                         |to fence so ref goes to 0
> +	 *                                         |and release is called
> +	 *                                         |
> +	 *                                         -> dma_fence_release()
> +	 *                                            |
> +	 *                                            -> WARN_ON triggered
> +	 *
> +	 *      go over CB list,
> +	 *      call each CB and remove it
> +	 *      }
> +	 *
> +	 *
> +	 */
>  	WARN_ON(!list_empty(&fence->cb_list));
>  
>  	if (fence->ops->release)
> -- 
> 2.14.3
>
Oded Gabbay Jan. 31, 2018, 9:03 a.m. UTC | #2
On Tue, Jan 30, 2018 at 12:33 PM, Daniel Vetter <daniel@ffwll.ch> wrote:
> On Mon, Jan 29, 2018 at 05:40:02PM +0200, Oded Gabbay wrote:
>> In dma_fence_release() there is a WARN_ON which could be triggered by
>> several cases of wrong dma-fence usage. This patch adds a comment to
>> explain two use-cases to help driver developers that use dma-fence
>> and trigger that WARN_ON to better understand the reasons for it.
>>
>> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
>
> Not sure how useful this is, trying to do anything with a dma_fence while
> not holding a reference is just plain buggy. If you do that, then all
> kinds of use-after-free hilarity can happen. Trying to enumerate all the
> ways you can get refcounting wrong seems futile.
>
> What we maybe could do is a simple one-line like
>
>         /* Failed to signal before release, could be a refcounting issue. */
>
> I think this is generally a true statement and more useful hint for the
> next convoluted scenario (which in all likelihood will be different from
> yours).
> -Daniel
>
>> ---
>>  drivers/dma-buf/dma-fence.c | 33 +++++++++++++++++++++++++++++++++
>>  1 file changed, 33 insertions(+)
>>
>> diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
>> index 5d101c4053e0..a7170ab23ec0 100644
>> --- a/drivers/dma-buf/dma-fence.c
>> +++ b/drivers/dma-buf/dma-fence.c
>> @@ -171,6 +171,39 @@ void dma_fence_release(struct kref *kref)
>>
>>       trace_dma_fence_destroy(fence);
>>
>> +     /*
>> +      * If the WARN_ON below is triggered it could be because the dma fence
>> +      * was not signaled and therefore, the cb list is still not empty
>> +      * because the cb functions were not called.
>> +      *
>> +      * A more subtle case is where the fence got signaled by a thread that
>> +      * didn't hold a ref to the fence. The following describes the scenario:
>> +      *
>> +      *      Thread A                            Thread B
>> +      *--------------------------        --------------------------
>> +      * calls dma_fence_signal() {
>> +      *      set signal bit
>> +      *
>> +      *            scheduled out
>> +      *      ---------------------------> calls dma_fence_wait_timeout() and
>> +      *                                   returns immediately
>> +      *
>> +      *                                   calls dma_fence_put()
>> +      *                                         |
>> +      *                                         |thread A doesn't hold ref
>> +      *                                         |to fence so ref goes to 0
>> +      *                                         |and release is called
>> +      *                                         |
>> +      *                                         -> dma_fence_release()
>> +      *                                            |
>> +      *                                            -> WARN_ON triggered
>> +      *
>> +      *      go over CB list,
>> +      *      call each CB and remove it
>> +      *      }
>> +      *
>> +      *
>> +      */
>>       WARN_ON(!list_empty(&fence->cb_list));
>>
>>       if (fence->ops->release)
>> --
>> 2.14.3
>>
>
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch

Hi Daniel,
Yeah, I get what you are saying.
Can I add your RB to the one-liner version ?

Thanks,
Oded
Daniel Vetter Jan. 31, 2018, 10:18 a.m. UTC | #3
On Wed, Jan 31, 2018 at 11:03:39AM +0200, Oded Gabbay wrote:
> On Tue, Jan 30, 2018 at 12:33 PM, Daniel Vetter <daniel@ffwll.ch> wrote:
> > On Mon, Jan 29, 2018 at 05:40:02PM +0200, Oded Gabbay wrote:
> >> In dma_fence_release() there is a WARN_ON which could be triggered by
> >> several cases of wrong dma-fence usage. This patch adds a comment to
> >> explain two use-cases to help driver developers that use dma-fence
> >> and trigger that WARN_ON to better understand the reasons for it.
> >>
> >> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
> >
> > Not sure how useful this is, trying to do anything with a dma_fence while
> > not holding a reference is just plain buggy. If you do that, then all
> > kinds of use-after-free hilarity can happen. Trying to enumerate all the
> > ways you can get refcounting wrong seems futile.
> >
> > What we maybe could do is a simple one-line like
> >
> >         /* Failed to signal before release, could be a refcounting issue. */
> >
> > I think this is generally a true statement and more useful hint for the
> > next convoluted scenario (which in all likelihood will be different from
> > yours).
> > -Daniel
> >
> >> ---
> >>  drivers/dma-buf/dma-fence.c | 33 +++++++++++++++++++++++++++++++++
> >>  1 file changed, 33 insertions(+)
> >>
> >> diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
> >> index 5d101c4053e0..a7170ab23ec0 100644
> >> --- a/drivers/dma-buf/dma-fence.c
> >> +++ b/drivers/dma-buf/dma-fence.c
> >> @@ -171,6 +171,39 @@ void dma_fence_release(struct kref *kref)
> >>
> >>       trace_dma_fence_destroy(fence);
> >>
> >> +     /*
> >> +      * If the WARN_ON below is triggered it could be because the dma fence
> >> +      * was not signaled and therefore, the cb list is still not empty
> >> +      * because the cb functions were not called.
> >> +      *
> >> +      * A more subtle case is where the fence got signaled by a thread that
> >> +      * didn't hold a ref to the fence. The following describes the scenario:
> >> +      *
> >> +      *      Thread A                            Thread B
> >> +      *--------------------------        --------------------------
> >> +      * calls dma_fence_signal() {
> >> +      *      set signal bit
> >> +      *
> >> +      *            scheduled out
> >> +      *      ---------------------------> calls dma_fence_wait_timeout() and
> >> +      *                                   returns immediately
> >> +      *
> >> +      *                                   calls dma_fence_put()
> >> +      *                                         |
> >> +      *                                         |thread A doesn't hold ref
> >> +      *                                         |to fence so ref goes to 0
> >> +      *                                         |and release is called
> >> +      *                                         |
> >> +      *                                         -> dma_fence_release()
> >> +      *                                            |
> >> +      *                                            -> WARN_ON triggered
> >> +      *
> >> +      *      go over CB list,
> >> +      *      call each CB and remove it
> >> +      *      }
> >> +      *
> >> +      *
> >> +      */
> >>       WARN_ON(!list_empty(&fence->cb_list));
> >>
> >>       if (fence->ops->release)
> >> --
> >> 2.14.3
> >>
> >
> > --
> > Daniel Vetter
> > Software Engineer, Intel Corporation
> > http://blog.ffwll.ch
> 
> Hi Daniel,
> Yeah, I get what you are saying.
> Can I add your RB to the one-liner version ?

Sure.

Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>

Cheers, Daniel
diff mbox

Patch

diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
index 5d101c4053e0..a7170ab23ec0 100644
--- a/drivers/dma-buf/dma-fence.c
+++ b/drivers/dma-buf/dma-fence.c
@@ -171,6 +171,39 @@  void dma_fence_release(struct kref *kref)
 
 	trace_dma_fence_destroy(fence);
 
+	/*
+	 * If the WARN_ON below is triggered it could be because the dma fence
+	 * was not signaled and therefore, the cb list is still not empty
+	 * because the cb functions were not called.
+	 *
+	 * A more subtle case is where the fence got signaled by a thread that
+	 * didn't hold a ref to the fence. The following describes the scenario:
+	 *
+	 *      Thread A                            Thread B
+	 *--------------------------        --------------------------
+	 * calls dma_fence_signal() {
+	 *      set signal bit
+	 *
+	 *            scheduled out
+	 *      ---------------------------> calls dma_fence_wait_timeout() and
+	 *                                   returns immediately
+	 *
+	 *                                   calls dma_fence_put()
+	 *                                         |
+	 *                                         |thread A doesn't hold ref
+	 *                                         |to fence so ref goes to 0
+	 *                                         |and release is called
+	 *                                         |
+	 *                                         -> dma_fence_release()
+	 *                                            |
+	 *                                            -> WARN_ON triggered
+	 *
+	 *      go over CB list,
+	 *      call each CB and remove it
+	 *      }
+	 *
+	 *
+	 */
 	WARN_ON(!list_empty(&fence->cb_list));
 
 	if (fence->ops->release)