Message ID | 1473809067-6019-1-git-send-email-rafael.antognolli@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Hi Rafael, 2016-09-13 Rafael Antognolli <rafael.antognolli@intel.com>: > The refcount of a fence should be increased whenever it is added to a merged > fence, since it will later be decreased when the merged fence is destroyed. > Failing to do so will cause the original fence to be freed if the merged fence > gets freed, but other places still referencing won't know about it. > > This patch fixes a kernel panic that can be triggered by creating a fence that > is expired (or increasing the timeline until it expires), then creating a > merged fence out of it, and deleting the merged fence. This will make the > original expired fence's refcount go to zero. > > Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com> > --- > > Sample code to trigger the mentioned kernel panic (might need to be executed a > couple times before it actually breaks everything): > > static void test_sync_expired_merge(void) > { > int iterations = 1 << 20; > int timeline; > int i; > int fence_expired, fence_merged; > > timeline = sw_sync_timeline_create(); > > sw_sync_timeline_inc(timeline, 100); > fence_expired = sw_sync_fence_create(timeline, 1); > fence_merged = sw_sync_merge(fence_expired, fence_expired); > sw_sync_fence_destroy(fence_merged); > > for (i = 0; i < iterations; i++) { > int fence = sw_sync_merge(fence_expired, fence_expired); > > igt_assert_f(sw_sync_wait(fence, -1) > 0, > "Failure waiting on fence\n"); > sw_sync_fence_destroy(fence); > } > > sw_sync_fence_destroy(fence_expired); > } > > drivers/dma-buf/sync_file.c | 7 ++----- > 1 file changed, 2 insertions(+), 5 deletions(-) Thanks for spotting this. Reviewed-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk> Gustavo
On Tue, Sep 13, 2016 at 04:24:27PM -0700, Rafael Antognolli wrote: > The refcount of a fence should be increased whenever it is added to a merged > fence, since it will later be decreased when the merged fence is destroyed. > Failing to do so will cause the original fence to be freed if the merged fence > gets freed, but other places still referencing won't know about it. > > This patch fixes a kernel panic that can be triggered by creating a fence that > is expired (or increasing the timeline until it expires), then creating a > merged fence out of it, and deleting the merged fence. This will make the > original expired fence's refcount go to zero. > > Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com> > --- > > Sample code to trigger the mentioned kernel panic (might need to be executed a > couple times before it actually breaks everything): > > static void test_sync_expired_merge(void) > { > int iterations = 1 << 20; > int timeline; > int i; > int fence_expired, fence_merged; > > timeline = sw_sync_timeline_create(); > > sw_sync_timeline_inc(timeline, 100); > fence_expired = sw_sync_fence_create(timeline, 1); > fence_merged = sw_sync_merge(fence_expired, fence_expired); > sw_sync_fence_destroy(fence_merged); > > for (i = 0; i < iterations; i++) { > int fence = sw_sync_merge(fence_expired, fence_expired); > > igt_assert_f(sw_sync_wait(fence, -1) > 0, > "Failure waiting on fence\n"); > sw_sync_fence_destroy(fence); > } > > sw_sync_fence_destroy(fence_expired); > } > > drivers/dma-buf/sync_file.c | 7 ++----- > 1 file changed, 2 insertions(+), 5 deletions(-) > > diff --git a/drivers/dma-buf/sync_file.c b/drivers/dma-buf/sync_file.c > index 486d29c..6ce6b8f 100644 > --- a/drivers/dma-buf/sync_file.c > +++ b/drivers/dma-buf/sync_file.c > @@ -178,11 +178,8 @@ static struct fence **get_fences(struct sync_file *sync_file, int *num_fences) > static void add_fence(struct fence **fences, int *i, struct fence *fence) > { > fences[*i] = fence; > - > - if (!fence_is_signaled(fence)) { > - fence_get(fence); > - (*i)++; > - } > + fence_get(fence); > + (*i)++; > } I think you'll find it's the caller: if (i == 0) { add_fence(fences, &i, a_fences[0]); i++; } that does the unexpected. This should be if (i == 0) fences[i++] = fence_get(a_fences[0]); That ensures the sync_file inherits the signaled status without having to keep all fences. I think there still seems to be a memory leak when calling sync_file_set_fence() here with i == 1. -Chris
2016-09-14 Chris Wilson <chris@chris-wilson.co.uk>: > On Tue, Sep 13, 2016 at 04:24:27PM -0700, Rafael Antognolli wrote: > > The refcount of a fence should be increased whenever it is added to a merged > > fence, since it will later be decreased when the merged fence is destroyed. > > Failing to do so will cause the original fence to be freed if the merged fence > > gets freed, but other places still referencing won't know about it. > > > > This patch fixes a kernel panic that can be triggered by creating a fence that > > is expired (or increasing the timeline until it expires), then creating a > > merged fence out of it, and deleting the merged fence. This will make the > > original expired fence's refcount go to zero. > > > > Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com> > > --- > > > > Sample code to trigger the mentioned kernel panic (might need to be executed a > > couple times before it actually breaks everything): > > > > static void test_sync_expired_merge(void) > > { > > int iterations = 1 << 20; > > int timeline; > > int i; > > int fence_expired, fence_merged; > > > > timeline = sw_sync_timeline_create(); > > > > sw_sync_timeline_inc(timeline, 100); > > fence_expired = sw_sync_fence_create(timeline, 1); > > fence_merged = sw_sync_merge(fence_expired, fence_expired); > > sw_sync_fence_destroy(fence_merged); > > > > for (i = 0; i < iterations; i++) { > > int fence = sw_sync_merge(fence_expired, fence_expired); > > > > igt_assert_f(sw_sync_wait(fence, -1) > 0, > > "Failure waiting on fence\n"); > > sw_sync_fence_destroy(fence); > > } > > > > sw_sync_fence_destroy(fence_expired); > > } > > > > drivers/dma-buf/sync_file.c | 7 ++----- > > 1 file changed, 2 insertions(+), 5 deletions(-) > > > > diff --git a/drivers/dma-buf/sync_file.c b/drivers/dma-buf/sync_file.c > > index 486d29c..6ce6b8f 100644 > > --- a/drivers/dma-buf/sync_file.c > > +++ b/drivers/dma-buf/sync_file.c > > @@ -178,11 +178,8 @@ static struct fence **get_fences(struct sync_file *sync_file, int *num_fences) > > static void add_fence(struct fence **fences, int *i, struct fence *fence) > > { > > fences[*i] = fence; > > - > > - if (!fence_is_signaled(fence)) { > > - fence_get(fence); > > - (*i)++; > > - } > > + fence_get(fence); > > + (*i)++; > > } > > I think you'll find it's the caller: > > if (i == 0) { > add_fence(fences, &i, a_fences[0]); > i++; > } > > that does the unexpected. > > This should be > > if (i == 0) > fences[i++] = fence_get(a_fences[0]); You are right. Otherwise we would miss the extra reference. i == 0 here means that all fences are signalled. > > > That ensures the sync_file inherits the signaled status without having > to keep all fences. > > I think there still seems to be a memory leak when calling > sync_file_set_fence() here with i == 1. I think that is something we discussed already, we don't hold an extra ref there for i == 1 because it would leak the fence. Gustavo
On Wed, Sep 14, 2016 at 11:04:01AM -0300, Gustavo Padovan wrote: > 2016-09-14 Chris Wilson <chris@chris-wilson.co.uk>: > > > On Tue, Sep 13, 2016 at 04:24:27PM -0700, Rafael Antognolli wrote: > > > The refcount of a fence should be increased whenever it is added to a merged > > > fence, since it will later be decreased when the merged fence is destroyed. > > > Failing to do so will cause the original fence to be freed if the merged fence > > > gets freed, but other places still referencing won't know about it. > > > > > > This patch fixes a kernel panic that can be triggered by creating a fence that > > > is expired (or increasing the timeline until it expires), then creating a > > > merged fence out of it, and deleting the merged fence. This will make the > > > original expired fence's refcount go to zero. > > > > > > Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com> > > > --- > > > > > > Sample code to trigger the mentioned kernel panic (might need to be executed a > > > couple times before it actually breaks everything): > > > > > > static void test_sync_expired_merge(void) > > > { > > > int iterations = 1 << 20; > > > int timeline; > > > int i; > > > int fence_expired, fence_merged; > > > > > > timeline = sw_sync_timeline_create(); > > > > > > sw_sync_timeline_inc(timeline, 100); > > > fence_expired = sw_sync_fence_create(timeline, 1); > > > fence_merged = sw_sync_merge(fence_expired, fence_expired); > > > sw_sync_fence_destroy(fence_merged); > > > > > > for (i = 0; i < iterations; i++) { > > > int fence = sw_sync_merge(fence_expired, fence_expired); > > > > > > igt_assert_f(sw_sync_wait(fence, -1) > 0, > > > "Failure waiting on fence\n"); > > > sw_sync_fence_destroy(fence); > > > } > > > > > > sw_sync_fence_destroy(fence_expired); > > > } > > > > > > drivers/dma-buf/sync_file.c | 7 ++----- > > > 1 file changed, 2 insertions(+), 5 deletions(-) > > > > > > diff --git a/drivers/dma-buf/sync_file.c b/drivers/dma-buf/sync_file.c > > > index 486d29c..6ce6b8f 100644 > > > --- a/drivers/dma-buf/sync_file.c > > > +++ b/drivers/dma-buf/sync_file.c > > > @@ -178,11 +178,8 @@ static struct fence **get_fences(struct sync_file *sync_file, int *num_fences) > > > static void add_fence(struct fence **fences, int *i, struct fence *fence) > > > { > > > fences[*i] = fence; > > > - > > > - if (!fence_is_signaled(fence)) { > > > - fence_get(fence); > > > - (*i)++; > > > - } > > > + fence_get(fence); > > > + (*i)++; > > > } > > > > I think you'll find it's the caller: > > > > if (i == 0) { > > add_fence(fences, &i, a_fences[0]); > > i++; > > } > > > > that does the unexpected. > > > > This should be > > > > if (i == 0) > > fences[i++] = fence_get(a_fences[0]); > > You are right. Otherwise we would miss the extra reference. i == 0 here > means that all fences are signalled. > > > > > > > That ensures the sync_file inherits the signaled status without having > > to keep all fences. OK, so if you are building a merged fence out of several signaled fences, then you only keep one of them instead? I haven't noticed that was the intention. > > I think there still seems to be a memory leak when calling > > sync_file_set_fence() here with i == 1. > > I think that is something we discussed already, we don't hold an extra > ref there for i == 1 because it would leak the fence. > I agree, we already increase the reference count in add_fence anyway (or here, assuming we do what Chris suggested. But I also believe that's the cause of confusion. IMHO it would be better to call fence_get() inside fence_array_create(), so it would match the fence_put in fence_array_release(). And if we have only one fence, fence_get it inside sync_file_set_fence(). Rafael
2016-09-14 Rafael Antognolli <rafael.antognolli@intel.com>: > On Wed, Sep 14, 2016 at 11:04:01AM -0300, Gustavo Padovan wrote: > > 2016-09-14 Chris Wilson <chris@chris-wilson.co.uk>: > > > > > On Tue, Sep 13, 2016 at 04:24:27PM -0700, Rafael Antognolli wrote: > > > > The refcount of a fence should be increased whenever it is added to a merged > > > > fence, since it will later be decreased when the merged fence is destroyed. > > > > Failing to do so will cause the original fence to be freed if the merged fence > > > > gets freed, but other places still referencing won't know about it. > > > > > > > > This patch fixes a kernel panic that can be triggered by creating a fence that > > > > is expired (or increasing the timeline until it expires), then creating a > > > > merged fence out of it, and deleting the merged fence. This will make the > > > > original expired fence's refcount go to zero. > > > > > > > > Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com> > > > > --- > > > > > > > > Sample code to trigger the mentioned kernel panic (might need to be executed a > > > > couple times before it actually breaks everything): > > > > > > > > static void test_sync_expired_merge(void) > > > > { > > > > int iterations = 1 << 20; > > > > int timeline; > > > > int i; > > > > int fence_expired, fence_merged; > > > > > > > > timeline = sw_sync_timeline_create(); > > > > > > > > sw_sync_timeline_inc(timeline, 100); > > > > fence_expired = sw_sync_fence_create(timeline, 1); > > > > fence_merged = sw_sync_merge(fence_expired, fence_expired); > > > > sw_sync_fence_destroy(fence_merged); > > > > > > > > for (i = 0; i < iterations; i++) { > > > > int fence = sw_sync_merge(fence_expired, fence_expired); > > > > > > > > igt_assert_f(sw_sync_wait(fence, -1) > 0, > > > > "Failure waiting on fence\n"); > > > > sw_sync_fence_destroy(fence); > > > > } > > > > > > > > sw_sync_fence_destroy(fence_expired); > > > > } > > > > > > > > drivers/dma-buf/sync_file.c | 7 ++----- > > > > 1 file changed, 2 insertions(+), 5 deletions(-) > > > > > > > > diff --git a/drivers/dma-buf/sync_file.c b/drivers/dma-buf/sync_file.c > > > > index 486d29c..6ce6b8f 100644 > > > > --- a/drivers/dma-buf/sync_file.c > > > > +++ b/drivers/dma-buf/sync_file.c > > > > @@ -178,11 +178,8 @@ static struct fence **get_fences(struct sync_file *sync_file, int *num_fences) > > > > static void add_fence(struct fence **fences, int *i, struct fence *fence) > > > > { > > > > fences[*i] = fence; > > > > - > > > > - if (!fence_is_signaled(fence)) { > > > > - fence_get(fence); > > > > - (*i)++; > > > > - } > > > > + fence_get(fence); > > > > + (*i)++; > > > > } > > > > > > I think you'll find it's the caller: > > > > > > if (i == 0) { > > > add_fence(fences, &i, a_fences[0]); > > > i++; > > > } > > > > > > that does the unexpected. > > > > > > This should be > > > > > > if (i == 0) > > > fences[i++] = fence_get(a_fences[0]); > > > > You are right. Otherwise we would miss the extra reference. i == 0 here > > means that all fences are signalled. > > > > > > > > > > > That ensures the sync_file inherits the signaled status without having > > > to keep all fences. > > OK, so if you are building a merged fence out of several signaled > fences, then you only keep one of them instead? I haven't noticed that > was the intention. > > > > I think there still seems to be a memory leak when calling > > > sync_file_set_fence() here with i == 1. > > > > I think that is something we discussed already, we don't hold an extra > > ref there for i == 1 because it would leak the fence. > > > > I agree, we already increase the reference count in add_fence anyway (or > here, assuming we do what Chris suggested. But I also believe that's the > cause of confusion. > > IMHO it would be better to call fence_get() inside fence_array_create(), > so it would match the fence_put in fence_array_release(). And if we have > only one fence, fence_get it inside sync_file_set_fence(). sync_file is not the only user of fence_array, so I would not push this behaviour changes to others users. The current behaviour is well documented inside sync_file_set_fence, not sure we need to change it. Gustavo
On Wed, Sep 14, 2016 at 11:04:01AM -0300, Gustavo Padovan wrote: > 2016-09-14 Chris Wilson <chris@chris-wilson.co.uk>: > > I think there still seems to be a memory leak when calling > > sync_file_set_fence() here with i == 1. > > I think that is something we discussed already, we don't hold an extra > ref there for i == 1 because it would leak the fence. That's fine. I'm worried about the array we kmalloc here but sync_file->fence = fences[0]. Where does the array go? I *think* we need a if (i == 1) kfree(fences); -Chris
diff --git a/drivers/dma-buf/sync_file.c b/drivers/dma-buf/sync_file.c index 486d29c..6ce6b8f 100644 --- a/drivers/dma-buf/sync_file.c +++ b/drivers/dma-buf/sync_file.c @@ -178,11 +178,8 @@ static struct fence **get_fences(struct sync_file *sync_file, int *num_fences) static void add_fence(struct fence **fences, int *i, struct fence *fence) { fences[*i] = fence; - - if (!fence_is_signaled(fence)) { - fence_get(fence); - (*i)++; - } + fence_get(fence); + (*i)++; } /**
The refcount of a fence should be increased whenever it is added to a merged fence, since it will later be decreased when the merged fence is destroyed. Failing to do so will cause the original fence to be freed if the merged fence gets freed, but other places still referencing won't know about it. This patch fixes a kernel panic that can be triggered by creating a fence that is expired (or increasing the timeline until it expires), then creating a merged fence out of it, and deleting the merged fence. This will make the original expired fence's refcount go to zero. Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com> --- Sample code to trigger the mentioned kernel panic (might need to be executed a couple times before it actually breaks everything): static void test_sync_expired_merge(void) { int iterations = 1 << 20; int timeline; int i; int fence_expired, fence_merged; timeline = sw_sync_timeline_create(); sw_sync_timeline_inc(timeline, 100); fence_expired = sw_sync_fence_create(timeline, 1); fence_merged = sw_sync_merge(fence_expired, fence_expired); sw_sync_fence_destroy(fence_merged); for (i = 0; i < iterations; i++) { int fence = sw_sync_merge(fence_expired, fence_expired); igt_assert_f(sw_sync_wait(fence, -1) > 0, "Failure waiting on fence\n"); sw_sync_fence_destroy(fence); } sw_sync_fence_destroy(fence_expired); } drivers/dma-buf/sync_file.c | 7 ++----- 1 file changed, 2 insertions(+), 5 deletions(-)