diff mbox

dma-buf/dma-fence: Signal all callbacks from dma_fence_release()

Message ID 20170811170147.26416-1-chris@chris-wilson.co.uk (mailing list archive)
State New, archived
Headers show

Commit Message

Chris Wilson Aug. 11, 2017, 5:01 p.m. UTC
This is an illegal scenario, to free the fence whilst there are pending
callbacks. Currently, we emit a WARN and then cast aside the callbacks
leaving them dangling. Alternatively, we could set an error on the fence
and then signal fence so that any dependency chains from the fence can
be tidied up, and if they care they can check for the error.

The question is whether or not the cure is worse than the disease
(premature fence signaling is never pretty).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/dma-buf/dma-fence.c | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

Comments

Christian König Aug. 13, 2017, 1:04 p.m. UTC | #1
Am 11.08.2017 um 19:01 schrieb Chris Wilson:
> This is an illegal scenario, to free the fence whilst there are pending
> callbacks. Currently, we emit a WARN and then cast aside the callbacks
> leaving them dangling. Alternatively, we could set an error on the fence
> and then signal fence so that any dependency chains from the fence can
> be tidied up, and if they care they can check for the error.
>
> The question is whether or not the cure is worse than the disease
> (premature fence signaling is never pretty).
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

Not sure if -EDEADLK is the best error code, but in general the approach 
sounds like the least evil to me.

Patch is Reviewed-by: Christian König <christian.koenig@amd.com>.

> ---
>   drivers/dma-buf/dma-fence.c | 14 +++++++++++++-
>   1 file changed, 13 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
> index 9a302799040e..ed311edbeefa 100644
> --- a/drivers/dma-buf/dma-fence.c
> +++ b/drivers/dma-buf/dma-fence.c
> @@ -172,7 +172,19 @@ void dma_fence_release(struct kref *kref)
>   
>   	trace_dma_fence_destroy(fence);
>   
> -	WARN_ON(!list_empty(&fence->cb_list));
> +	if (WARN_ON(!list_empty(&fence->cb_list))) {
> +		unsigned long flags;
> +
> +		/*
> +		 * This should never happen, but if it does make sure that we
> +		 * don't leave chains dangling. We set the error flag first
> +		 * so that the callbacks know this signal is due to an error.
> +		 */
> +		spin_lock_irqsave(fence->lock, flags);
> +		fence->error = -EDEADLK;
> +		dma_fence_signal_locked(fence);
> +		spin_unlock_irqrestore(fence->lock, flags);
> +	}
>   
>   	if (fence->ops->release)
>   		fence->ops->release(fence);
Chris Wilson Aug. 15, 2017, 11:23 a.m. UTC | #2
Quoting Christian König (2017-08-13 14:04:29)
> Am 11.08.2017 um 19:01 schrieb Chris Wilson:
> > This is an illegal scenario, to free the fence whilst there are pending
> > callbacks. Currently, we emit a WARN and then cast aside the callbacks
> > leaving them dangling. Alternatively, we could set an error on the fence
> > and then signal fence so that any dependency chains from the fence can
> > be tidied up, and if they care they can check for the error.
> >
> > The question is whether or not the cure is worse than the disease
> > (premature fence signaling is never pretty).
> >
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> 
> Not sure if -EDEADLK is the best error code, but in general the approach 
> sounds like the least evil to me.

EDEADLK felt appropriate since that was the situation that I expect might
arise, well livelock.

-ENXIO? That seems a reasonable alternative.
-ENODEV?
The classic -ENOTTY?
-Chris
Christian König Aug. 15, 2017, 12:01 p.m. UTC | #3
Am 15.08.2017 um 13:23 schrieb Chris Wilson:
> Quoting Christian König (2017-08-13 14:04:29)
>> Am 11.08.2017 um 19:01 schrieb Chris Wilson:
>>> This is an illegal scenario, to free the fence whilst there are pending
>>> callbacks. Currently, we emit a WARN and then cast aside the callbacks
>>> leaving them dangling. Alternatively, we could set an error on the fence
>>> and then signal fence so that any dependency chains from the fence can
>>> be tidied up, and if they care they can check for the error.
>>>
>>> The question is whether or not the cure is worse than the disease
>>> (premature fence signaling is never pretty).
>>>
>>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>> Not sure if -EDEADLK is the best error code, but in general the approach
>> sounds like the least evil to me.
> EDEADLK felt appropriate since that was the situation that I expect might
> arise, well livelock.
>
> -ENXIO? That seems a reasonable alternative.
> -ENODEV?
> The classic -ENOTTY?

No strong opinion either. For now I think we should just stick with 
-EDEADLK.

We would need something like -EHUMAN, because in this particular case 
the human who wrote the code has messed up the reference counting.

Christian.

> -Chris
Chris Wilson Jan. 31, 2018, 10:22 a.m. UTC | #4
Quoting Christian König (2017-08-13 14:04:29)
> Am 11.08.2017 um 19:01 schrieb Chris Wilson:
> > This is an illegal scenario, to free the fence whilst there are pending
> > callbacks. Currently, we emit a WARN and then cast aside the callbacks
> > leaving them dangling. Alternatively, we could set an error on the fence
> > and then signal fence so that any dependency chains from the fence can
> > be tidied up, and if they care they can check for the error.
> >
> > The question is whether or not the cure is worse than the disease
> > (premature fence signaling is never pretty).
> >
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> 
> Not sure if -EDEADLK is the best error code, but in general the approach 
> sounds like the least evil to me.
> 
> Patch is Reviewed-by: Christian König <christian.koenig@amd.com>.

Since the spectre of an early dma_fence_release() has shown itself
recently, I thought I'd ping this patch for inclusion along with the new
comment.
-Chris

> > ---
> >   drivers/dma-buf/dma-fence.c | 14 +++++++++++++-
> >   1 file changed, 13 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
> > index 9a302799040e..ed311edbeefa 100644
> > --- a/drivers/dma-buf/dma-fence.c
> > +++ b/drivers/dma-buf/dma-fence.c
> > @@ -172,7 +172,19 @@ void dma_fence_release(struct kref *kref)
> >   
> >       trace_dma_fence_destroy(fence);
> >   
> > -     WARN_ON(!list_empty(&fence->cb_list));
> > +     if (WARN_ON(!list_empty(&fence->cb_list))) {
> > +             unsigned long flags;
> > +
> > +             /*
> > +              * This should never happen, but if it does make sure that we
> > +              * don't leave chains dangling. We set the error flag first
> > +              * so that the callbacks know this signal is due to an error.
> > +              */
> > +             spin_lock_irqsave(fence->lock, flags);
> > +             fence->error = -EDEADLK;
> > +             dma_fence_signal_locked(fence);
> > +             spin_unlock_irqrestore(fence->lock, flags);
> > +     }
> >   
> >       if (fence->ops->release)
> >               fence->ops->release(fence);
> 
>
diff mbox

Patch

diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
index 9a302799040e..ed311edbeefa 100644
--- a/drivers/dma-buf/dma-fence.c
+++ b/drivers/dma-buf/dma-fence.c
@@ -172,7 +172,19 @@  void dma_fence_release(struct kref *kref)
 
 	trace_dma_fence_destroy(fence);
 
-	WARN_ON(!list_empty(&fence->cb_list));
+	if (WARN_ON(!list_empty(&fence->cb_list))) {
+		unsigned long flags;
+
+		/*
+		 * This should never happen, but if it does make sure that we
+		 * don't leave chains dangling. We set the error flag first
+		 * so that the callbacks know this signal is due to an error.
+		 */
+		spin_lock_irqsave(fence->lock, flags);
+		fence->error = -EDEADLK;
+		dma_fence_signal_locked(fence);
+		spin_unlock_irqrestore(fence->lock, flags);
+	}
 
 	if (fence->ops->release)
 		fence->ops->release(fence);