diff mbox

dma-buf/sync_file: Always increment refcount when merging fences.

Message ID 1473809067-6019-1-git-send-email-rafael.antognolli@intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Rafael Antognolli Sept. 13, 2016, 11:24 p.m. UTC
The refcount of a fence should be increased whenever it is added to a merged
fence, since it will later be decreased when the merged fence is destroyed.
Failing to do so will cause the original fence to be freed if the merged fence
gets freed, but other places still referencing won't know about it.

This patch fixes a kernel panic that can be triggered by creating a fence that
is expired (or increasing the timeline until it expires), then creating a
merged fence out of it, and deleting the merged fence. This will make the
original expired fence's refcount go to zero.

Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
---

Sample code to trigger the mentioned kernel panic (might need to be executed a
couple times before it actually breaks everything):

static void test_sync_expired_merge(void)
{
       int iterations = 1 << 20;
       int timeline;
       int i;
       int fence_expired, fence_merged;

       timeline = sw_sync_timeline_create();

       sw_sync_timeline_inc(timeline, 100);
       fence_expired = sw_sync_fence_create(timeline, 1);
       fence_merged = sw_sync_merge(fence_expired, fence_expired);
       sw_sync_fence_destroy(fence_merged);

       for (i = 0; i < iterations; i++) {
               int fence = sw_sync_merge(fence_expired, fence_expired);

               igt_assert_f(sw_sync_wait(fence, -1) > 0,
                                    "Failure waiting on fence\n");
               sw_sync_fence_destroy(fence);
       }

       sw_sync_fence_destroy(fence_expired);
}

 drivers/dma-buf/sync_file.c | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

Comments

Gustavo Padovan Sept. 13, 2016, 11:41 p.m. UTC | #1
Hi Rafael,

2016-09-13 Rafael Antognolli <rafael.antognolli@intel.com>:

> The refcount of a fence should be increased whenever it is added to a merged
> fence, since it will later be decreased when the merged fence is destroyed.
> Failing to do so will cause the original fence to be freed if the merged fence
> gets freed, but other places still referencing won't know about it.
> 
> This patch fixes a kernel panic that can be triggered by creating a fence that
> is expired (or increasing the timeline until it expires), then creating a
> merged fence out of it, and deleting the merged fence. This will make the
> original expired fence's refcount go to zero.
> 
> Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
> ---
> 
> Sample code to trigger the mentioned kernel panic (might need to be executed a
> couple times before it actually breaks everything):
> 
> static void test_sync_expired_merge(void)
> {
>        int iterations = 1 << 20;
>        int timeline;
>        int i;
>        int fence_expired, fence_merged;
> 
>        timeline = sw_sync_timeline_create();
> 
>        sw_sync_timeline_inc(timeline, 100);
>        fence_expired = sw_sync_fence_create(timeline, 1);
>        fence_merged = sw_sync_merge(fence_expired, fence_expired);
>        sw_sync_fence_destroy(fence_merged);
> 
>        for (i = 0; i < iterations; i++) {
>                int fence = sw_sync_merge(fence_expired, fence_expired);
> 
>                igt_assert_f(sw_sync_wait(fence, -1) > 0,
>                                     "Failure waiting on fence\n");
>                sw_sync_fence_destroy(fence);
>        }
> 
>        sw_sync_fence_destroy(fence_expired);
> }
> 
>  drivers/dma-buf/sync_file.c | 7 ++-----
>  1 file changed, 2 insertions(+), 5 deletions(-)


Thanks for spotting this.

Reviewed-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>

Gustavo
Chris Wilson Sept. 14, 2016, 10:05 a.m. UTC | #2
On Tue, Sep 13, 2016 at 04:24:27PM -0700, Rafael Antognolli wrote:
> The refcount of a fence should be increased whenever it is added to a merged
> fence, since it will later be decreased when the merged fence is destroyed.
> Failing to do so will cause the original fence to be freed if the merged fence
> gets freed, but other places still referencing won't know about it.
> 
> This patch fixes a kernel panic that can be triggered by creating a fence that
> is expired (or increasing the timeline until it expires), then creating a
> merged fence out of it, and deleting the merged fence. This will make the
> original expired fence's refcount go to zero.
> 
> Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
> ---
> 
> Sample code to trigger the mentioned kernel panic (might need to be executed a
> couple times before it actually breaks everything):
> 
> static void test_sync_expired_merge(void)
> {
>        int iterations = 1 << 20;
>        int timeline;
>        int i;
>        int fence_expired, fence_merged;
> 
>        timeline = sw_sync_timeline_create();
> 
>        sw_sync_timeline_inc(timeline, 100);
>        fence_expired = sw_sync_fence_create(timeline, 1);
>        fence_merged = sw_sync_merge(fence_expired, fence_expired);
>        sw_sync_fence_destroy(fence_merged);
> 
>        for (i = 0; i < iterations; i++) {
>                int fence = sw_sync_merge(fence_expired, fence_expired);
> 
>                igt_assert_f(sw_sync_wait(fence, -1) > 0,
>                                     "Failure waiting on fence\n");
>                sw_sync_fence_destroy(fence);
>        }
> 
>        sw_sync_fence_destroy(fence_expired);
> }
> 
>  drivers/dma-buf/sync_file.c | 7 ++-----
>  1 file changed, 2 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/dma-buf/sync_file.c b/drivers/dma-buf/sync_file.c
> index 486d29c..6ce6b8f 100644
> --- a/drivers/dma-buf/sync_file.c
> +++ b/drivers/dma-buf/sync_file.c
> @@ -178,11 +178,8 @@ static struct fence **get_fences(struct sync_file *sync_file, int *num_fences)
>  static void add_fence(struct fence **fences, int *i, struct fence *fence)
>  {
>  	fences[*i] = fence;
> -
> -	if (!fence_is_signaled(fence)) {
> -		fence_get(fence);
> -		(*i)++;
> -	}
> +	fence_get(fence);
> +	(*i)++;
>  }

I think you'll find it's the caller:

if (i == 0) {
	add_fence(fences, &i, a_fences[0]);
	i++;
}

that does the unexpected.

This should be 

if (i == 0)
	fences[i++] = fence_get(a_fences[0]);


That ensures the sync_file inherits the signaled status without having
to keep all fences.

I think there still seems to be a memory leak when calling
sync_file_set_fence() here with i == 1.
-Chris
Gustavo Padovan Sept. 14, 2016, 2:04 p.m. UTC | #3
2016-09-14 Chris Wilson <chris@chris-wilson.co.uk>:

> On Tue, Sep 13, 2016 at 04:24:27PM -0700, Rafael Antognolli wrote:
> > The refcount of a fence should be increased whenever it is added to a merged
> > fence, since it will later be decreased when the merged fence is destroyed.
> > Failing to do so will cause the original fence to be freed if the merged fence
> > gets freed, but other places still referencing won't know about it.
> > 
> > This patch fixes a kernel panic that can be triggered by creating a fence that
> > is expired (or increasing the timeline until it expires), then creating a
> > merged fence out of it, and deleting the merged fence. This will make the
> > original expired fence's refcount go to zero.
> > 
> > Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
> > ---
> > 
> > Sample code to trigger the mentioned kernel panic (might need to be executed a
> > couple times before it actually breaks everything):
> > 
> > static void test_sync_expired_merge(void)
> > {
> >        int iterations = 1 << 20;
> >        int timeline;
> >        int i;
> >        int fence_expired, fence_merged;
> > 
> >        timeline = sw_sync_timeline_create();
> > 
> >        sw_sync_timeline_inc(timeline, 100);
> >        fence_expired = sw_sync_fence_create(timeline, 1);
> >        fence_merged = sw_sync_merge(fence_expired, fence_expired);
> >        sw_sync_fence_destroy(fence_merged);
> > 
> >        for (i = 0; i < iterations; i++) {
> >                int fence = sw_sync_merge(fence_expired, fence_expired);
> > 
> >                igt_assert_f(sw_sync_wait(fence, -1) > 0,
> >                                     "Failure waiting on fence\n");
> >                sw_sync_fence_destroy(fence);
> >        }
> > 
> >        sw_sync_fence_destroy(fence_expired);
> > }
> > 
> >  drivers/dma-buf/sync_file.c | 7 ++-----
> >  1 file changed, 2 insertions(+), 5 deletions(-)
> > 
> > diff --git a/drivers/dma-buf/sync_file.c b/drivers/dma-buf/sync_file.c
> > index 486d29c..6ce6b8f 100644
> > --- a/drivers/dma-buf/sync_file.c
> > +++ b/drivers/dma-buf/sync_file.c
> > @@ -178,11 +178,8 @@ static struct fence **get_fences(struct sync_file *sync_file, int *num_fences)
> >  static void add_fence(struct fence **fences, int *i, struct fence *fence)
> >  {
> >  	fences[*i] = fence;
> > -
> > -	if (!fence_is_signaled(fence)) {
> > -		fence_get(fence);
> > -		(*i)++;
> > -	}
> > +	fence_get(fence);
> > +	(*i)++;
> >  }
> 
> I think you'll find it's the caller:
> 
> if (i == 0) {
> 	add_fence(fences, &i, a_fences[0]);
> 	i++;
> }
> 
> that does the unexpected.
> 
> This should be 
> 
> if (i == 0)
> 	fences[i++] = fence_get(a_fences[0]);

You are right. Otherwise we would miss the extra reference. i == 0 here
means that all fences are signalled.

> 
> 
> That ensures the sync_file inherits the signaled status without having
> to keep all fences.
> 
> I think there still seems to be a memory leak when calling
> sync_file_set_fence() here with i == 1.

I think that is something we discussed already, we don't hold an extra
ref there for i == 1 because it would leak the fence.

Gustavo
Rafael Antognolli Sept. 14, 2016, 3:45 p.m. UTC | #4
On Wed, Sep 14, 2016 at 11:04:01AM -0300, Gustavo Padovan wrote:
> 2016-09-14 Chris Wilson <chris@chris-wilson.co.uk>:
> 
> > On Tue, Sep 13, 2016 at 04:24:27PM -0700, Rafael Antognolli wrote:
> > > The refcount of a fence should be increased whenever it is added to a merged
> > > fence, since it will later be decreased when the merged fence is destroyed.
> > > Failing to do so will cause the original fence to be freed if the merged fence
> > > gets freed, but other places still referencing won't know about it.
> > > 
> > > This patch fixes a kernel panic that can be triggered by creating a fence that
> > > is expired (or increasing the timeline until it expires), then creating a
> > > merged fence out of it, and deleting the merged fence. This will make the
> > > original expired fence's refcount go to zero.
> > > 
> > > Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
> > > ---
> > > 
> > > Sample code to trigger the mentioned kernel panic (might need to be executed a
> > > couple times before it actually breaks everything):
> > > 
> > > static void test_sync_expired_merge(void)
> > > {
> > >        int iterations = 1 << 20;
> > >        int timeline;
> > >        int i;
> > >        int fence_expired, fence_merged;
> > > 
> > >        timeline = sw_sync_timeline_create();
> > > 
> > >        sw_sync_timeline_inc(timeline, 100);
> > >        fence_expired = sw_sync_fence_create(timeline, 1);
> > >        fence_merged = sw_sync_merge(fence_expired, fence_expired);
> > >        sw_sync_fence_destroy(fence_merged);
> > > 
> > >        for (i = 0; i < iterations; i++) {
> > >                int fence = sw_sync_merge(fence_expired, fence_expired);
> > > 
> > >                igt_assert_f(sw_sync_wait(fence, -1) > 0,
> > >                                     "Failure waiting on fence\n");
> > >                sw_sync_fence_destroy(fence);
> > >        }
> > > 
> > >        sw_sync_fence_destroy(fence_expired);
> > > }
> > > 
> > >  drivers/dma-buf/sync_file.c | 7 ++-----
> > >  1 file changed, 2 insertions(+), 5 deletions(-)
> > > 
> > > diff --git a/drivers/dma-buf/sync_file.c b/drivers/dma-buf/sync_file.c
> > > index 486d29c..6ce6b8f 100644
> > > --- a/drivers/dma-buf/sync_file.c
> > > +++ b/drivers/dma-buf/sync_file.c
> > > @@ -178,11 +178,8 @@ static struct fence **get_fences(struct sync_file *sync_file, int *num_fences)
> > >  static void add_fence(struct fence **fences, int *i, struct fence *fence)
> > >  {
> > >  	fences[*i] = fence;
> > > -
> > > -	if (!fence_is_signaled(fence)) {
> > > -		fence_get(fence);
> > > -		(*i)++;
> > > -	}
> > > +	fence_get(fence);
> > > +	(*i)++;
> > >  }
> > 
> > I think you'll find it's the caller:
> > 
> > if (i == 0) {
> > 	add_fence(fences, &i, a_fences[0]);
> > 	i++;
> > }
> > 
> > that does the unexpected.
> > 
> > This should be 
> > 
> > if (i == 0)
> > 	fences[i++] = fence_get(a_fences[0]);
> 
> You are right. Otherwise we would miss the extra reference. i == 0 here
> means that all fences are signalled.
> 
> > 
> > 
> > That ensures the sync_file inherits the signaled status without having
> > to keep all fences.

OK, so if you are building a merged fence out of several signaled
fences, then you only keep one of them instead? I haven't noticed that
was the intention.

> > I think there still seems to be a memory leak when calling
> > sync_file_set_fence() here with i == 1.
> 
> I think that is something we discussed already, we don't hold an extra
> ref there for i == 1 because it would leak the fence.
> 

I agree, we already increase the reference count in add_fence anyway (or
here, assuming we do what Chris suggested. But I also believe that's the
cause of confusion.

IMHO it would be better to call fence_get() inside fence_array_create(),
so it would match the fence_put in fence_array_release(). And if we have
only one fence, fence_get it inside sync_file_set_fence().

Rafael
Gustavo Padovan Sept. 14, 2016, 4:38 p.m. UTC | #5
2016-09-14 Rafael Antognolli <rafael.antognolli@intel.com>:

> On Wed, Sep 14, 2016 at 11:04:01AM -0300, Gustavo Padovan wrote:
> > 2016-09-14 Chris Wilson <chris@chris-wilson.co.uk>:
> > 
> > > On Tue, Sep 13, 2016 at 04:24:27PM -0700, Rafael Antognolli wrote:
> > > > The refcount of a fence should be increased whenever it is added to a merged
> > > > fence, since it will later be decreased when the merged fence is destroyed.
> > > > Failing to do so will cause the original fence to be freed if the merged fence
> > > > gets freed, but other places still referencing won't know about it.
> > > > 
> > > > This patch fixes a kernel panic that can be triggered by creating a fence that
> > > > is expired (or increasing the timeline until it expires), then creating a
> > > > merged fence out of it, and deleting the merged fence. This will make the
> > > > original expired fence's refcount go to zero.
> > > > 
> > > > Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
> > > > ---
> > > > 
> > > > Sample code to trigger the mentioned kernel panic (might need to be executed a
> > > > couple times before it actually breaks everything):
> > > > 
> > > > static void test_sync_expired_merge(void)
> > > > {
> > > >        int iterations = 1 << 20;
> > > >        int timeline;
> > > >        int i;
> > > >        int fence_expired, fence_merged;
> > > > 
> > > >        timeline = sw_sync_timeline_create();
> > > > 
> > > >        sw_sync_timeline_inc(timeline, 100);
> > > >        fence_expired = sw_sync_fence_create(timeline, 1);
> > > >        fence_merged = sw_sync_merge(fence_expired, fence_expired);
> > > >        sw_sync_fence_destroy(fence_merged);
> > > > 
> > > >        for (i = 0; i < iterations; i++) {
> > > >                int fence = sw_sync_merge(fence_expired, fence_expired);
> > > > 
> > > >                igt_assert_f(sw_sync_wait(fence, -1) > 0,
> > > >                                     "Failure waiting on fence\n");
> > > >                sw_sync_fence_destroy(fence);
> > > >        }
> > > > 
> > > >        sw_sync_fence_destroy(fence_expired);
> > > > }
> > > > 
> > > >  drivers/dma-buf/sync_file.c | 7 ++-----
> > > >  1 file changed, 2 insertions(+), 5 deletions(-)
> > > > 
> > > > diff --git a/drivers/dma-buf/sync_file.c b/drivers/dma-buf/sync_file.c
> > > > index 486d29c..6ce6b8f 100644
> > > > --- a/drivers/dma-buf/sync_file.c
> > > > +++ b/drivers/dma-buf/sync_file.c
> > > > @@ -178,11 +178,8 @@ static struct fence **get_fences(struct sync_file *sync_file, int *num_fences)
> > > >  static void add_fence(struct fence **fences, int *i, struct fence *fence)
> > > >  {
> > > >  	fences[*i] = fence;
> > > > -
> > > > -	if (!fence_is_signaled(fence)) {
> > > > -		fence_get(fence);
> > > > -		(*i)++;
> > > > -	}
> > > > +	fence_get(fence);
> > > > +	(*i)++;
> > > >  }
> > > 
> > > I think you'll find it's the caller:
> > > 
> > > if (i == 0) {
> > > 	add_fence(fences, &i, a_fences[0]);
> > > 	i++;
> > > }
> > > 
> > > that does the unexpected.
> > > 
> > > This should be 
> > > 
> > > if (i == 0)
> > > 	fences[i++] = fence_get(a_fences[0]);
> > 
> > You are right. Otherwise we would miss the extra reference. i == 0 here
> > means that all fences are signalled.
> > 
> > > 
> > > 
> > > That ensures the sync_file inherits the signaled status without having
> > > to keep all fences.
> 
> OK, so if you are building a merged fence out of several signaled
> fences, then you only keep one of them instead? I haven't noticed that
> was the intention.
> 
> > > I think there still seems to be a memory leak when calling
> > > sync_file_set_fence() here with i == 1.
> > 
> > I think that is something we discussed already, we don't hold an extra
> > ref there for i == 1 because it would leak the fence.
> > 
> 
> I agree, we already increase the reference count in add_fence anyway (or
> here, assuming we do what Chris suggested. But I also believe that's the
> cause of confusion.
> 
> IMHO it would be better to call fence_get() inside fence_array_create(),
> so it would match the fence_put in fence_array_release(). And if we have
> only one fence, fence_get it inside sync_file_set_fence().

sync_file is not the only user of fence_array, so I would not push this
behaviour changes to others users. The current behaviour is well
documented inside sync_file_set_fence, not sure we need to change it.

Gustavo
Chris Wilson Sept. 14, 2016, 5:57 p.m. UTC | #6
On Wed, Sep 14, 2016 at 11:04:01AM -0300, Gustavo Padovan wrote:
> 2016-09-14 Chris Wilson <chris@chris-wilson.co.uk>:
> > I think there still seems to be a memory leak when calling
> > sync_file_set_fence() here with i == 1.
> 
> I think that is something we discussed already, we don't hold an extra
> ref there for i == 1 because it would leak the fence.

That's fine. I'm worried about the array we kmalloc here but
sync_file->fence = fences[0]. Where does the array go?

I *think* we need a if (i == 1) kfree(fences);
-Chris
diff mbox

Patch

diff --git a/drivers/dma-buf/sync_file.c b/drivers/dma-buf/sync_file.c
index 486d29c..6ce6b8f 100644
--- a/drivers/dma-buf/sync_file.c
+++ b/drivers/dma-buf/sync_file.c
@@ -178,11 +178,8 @@  static struct fence **get_fences(struct sync_file *sync_file, int *num_fences)
 static void add_fence(struct fence **fences, int *i, struct fence *fence)
 {
 	fences[*i] = fence;
-
-	if (!fence_is_signaled(fence)) {
-		fence_get(fence);
-		(*i)++;
-	}
+	fence_get(fence);
+	(*i)++;
 }
 
 /**