Message ID | 20230712222523.7404-1-robdclark@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | drm/msm: Fix hw_fence error path cleanup | expand |
On 13/07/2023 01:25, Rob Clark wrote: > From: Rob Clark <robdclark@chromium.org> > > In an error path where the submit is free'd without the job being run, > the hw_fence pointer is simply a kzalloc'd block of memory. In this > case we should just kfree() it, rather than trying to decrement it's > reference count. Fortunately we can tell that this is the case by > checking for a zero refcount, since if the job was run, the submit would > be holding a reference to the hw_fence. > > Fixes: f94e6a51e17c ("drm/msm: Pre-allocate hw_fence") > Signed-off-by: Rob Clark <robdclark@chromium.org> > --- > drivers/gpu/drm/msm/msm_fence.c | 6 ++++++ > drivers/gpu/drm/msm/msm_gem_submit.c | 14 +++++++++++++- > 2 files changed, 19 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/msm/msm_fence.c b/drivers/gpu/drm/msm/msm_fence.c > index 96599ec3eb78..1a5d4f1c8b42 100644 > --- a/drivers/gpu/drm/msm/msm_fence.c > +++ b/drivers/gpu/drm/msm/msm_fence.c > @@ -191,6 +191,12 @@ msm_fence_init(struct dma_fence *fence, struct msm_fence_context *fctx) > > f->fctx = fctx; > > + /* > + * Until this point, the fence was just some pre-allocated memory, > + * no-one should have taken a reference to it yet. > + */ > + WARN_ON(kref_read(&fence->refcount)); It this really correct to return a refcounted object with 0 refcount (I'm looking at submit_create() / msm_fence_alloc() )? Maybe it would be better to move dma_fence_get() to msm_fence_alloc() ? But don't immediately see, which one should be moved. > + > dma_fence_init(&f->base, &msm_fence_ops, &fctx->spinlock, > fctx->context, ++fctx->last_fence); > } > diff --git a/drivers/gpu/drm/msm/msm_gem_submit.c b/drivers/gpu/drm/msm/msm_gem_submit.c > index 3f1aa4de3b87..9d66498cdc04 100644 > --- a/drivers/gpu/drm/msm/msm_gem_submit.c > +++ b/drivers/gpu/drm/msm/msm_gem_submit.c > @@ -86,7 +86,19 @@ void __msm_gem_submit_destroy(struct kref *kref) > } > > dma_fence_put(submit->user_fence); > - dma_fence_put(submit->hw_fence); > + > + /* > + * If the submit is freed before msm_job_run(), then hw_fence is > + * just some pre-allocated memory, not a reference counted fence. > + * Once the job runs and the hw_fence is initialized, it will > + * have a refcount of at least one, since the submit holds a ref > + * to the hw_fence. > + */ > + if (kref_read(&submit->hw_fence->refcount) == 0) { > + kfree(submit->hw_fence); > + } else { > + dma_fence_put(submit->hw_fence); > + } > > put_pid(submit->pid); > msm_submitqueue_put(submit->queue);
On Thu, Jul 13, 2023 at 1:03 PM Dmitry Baryshkov <dmitry.baryshkov@linaro.org> wrote: > > On 13/07/2023 01:25, Rob Clark wrote: > > From: Rob Clark <robdclark@chromium.org> > > > > In an error path where the submit is free'd without the job being run, > > the hw_fence pointer is simply a kzalloc'd block of memory. In this > > case we should just kfree() it, rather than trying to decrement it's > > reference count. Fortunately we can tell that this is the case by > > checking for a zero refcount, since if the job was run, the submit would > > be holding a reference to the hw_fence. > > > > Fixes: f94e6a51e17c ("drm/msm: Pre-allocate hw_fence") > > Signed-off-by: Rob Clark <robdclark@chromium.org> > > --- > > drivers/gpu/drm/msm/msm_fence.c | 6 ++++++ > > drivers/gpu/drm/msm/msm_gem_submit.c | 14 +++++++++++++- > > 2 files changed, 19 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/gpu/drm/msm/msm_fence.c b/drivers/gpu/drm/msm/msm_fence.c > > index 96599ec3eb78..1a5d4f1c8b42 100644 > > --- a/drivers/gpu/drm/msm/msm_fence.c > > +++ b/drivers/gpu/drm/msm/msm_fence.c > > @@ -191,6 +191,12 @@ msm_fence_init(struct dma_fence *fence, struct msm_fence_context *fctx) > > > > f->fctx = fctx; > > > > + /* > > + * Until this point, the fence was just some pre-allocated memory, > > + * no-one should have taken a reference to it yet. > > + */ > > + WARN_ON(kref_read(&fence->refcount)); > > It this really correct to return a refcounted object with 0 refcount > (I'm looking at submit_create() / msm_fence_alloc() )? Maybe it would be > better to move dma_fence_get() to msm_fence_alloc() ? But don't > immediately see, which one should be moved. The issue is that we can't really initialize the fence until msm_job_run(), when it is known what order the fence would be signaled. But we don't want to do any allocations in msm_job_run() because that could trigger the shrinker, which could need to wait until jobs complete to release memory, forming a deadlock. BR, -R > > + > > dma_fence_init(&f->base, &msm_fence_ops, &fctx->spinlock, > > fctx->context, ++fctx->last_fence); > > } > > diff --git a/drivers/gpu/drm/msm/msm_gem_submit.c b/drivers/gpu/drm/msm/msm_gem_submit.c > > index 3f1aa4de3b87..9d66498cdc04 100644 > > --- a/drivers/gpu/drm/msm/msm_gem_submit.c > > +++ b/drivers/gpu/drm/msm/msm_gem_submit.c > > @@ -86,7 +86,19 @@ void __msm_gem_submit_destroy(struct kref *kref) > > } > > > > dma_fence_put(submit->user_fence); > > - dma_fence_put(submit->hw_fence); > > + > > + /* > > + * If the submit is freed before msm_job_run(), then hw_fence is > > + * just some pre-allocated memory, not a reference counted fence. > > + * Once the job runs and the hw_fence is initialized, it will > > + * have a refcount of at least one, since the submit holds a ref > > + * to the hw_fence. > > + */ > > + if (kref_read(&submit->hw_fence->refcount) == 0) { > > + kfree(submit->hw_fence); > > + } else { > > + dma_fence_put(submit->hw_fence); > > + } > > > > put_pid(submit->pid); > > msm_submitqueue_put(submit->queue); > > -- > With best wishes > Dmitry >
diff --git a/drivers/gpu/drm/msm/msm_fence.c b/drivers/gpu/drm/msm/msm_fence.c index 96599ec3eb78..1a5d4f1c8b42 100644 --- a/drivers/gpu/drm/msm/msm_fence.c +++ b/drivers/gpu/drm/msm/msm_fence.c @@ -191,6 +191,12 @@ msm_fence_init(struct dma_fence *fence, struct msm_fence_context *fctx) f->fctx = fctx; + /* + * Until this point, the fence was just some pre-allocated memory, + * no-one should have taken a reference to it yet. + */ + WARN_ON(kref_read(&fence->refcount)); + dma_fence_init(&f->base, &msm_fence_ops, &fctx->spinlock, fctx->context, ++fctx->last_fence); } diff --git a/drivers/gpu/drm/msm/msm_gem_submit.c b/drivers/gpu/drm/msm/msm_gem_submit.c index 3f1aa4de3b87..9d66498cdc04 100644 --- a/drivers/gpu/drm/msm/msm_gem_submit.c +++ b/drivers/gpu/drm/msm/msm_gem_submit.c @@ -86,7 +86,19 @@ void __msm_gem_submit_destroy(struct kref *kref) } dma_fence_put(submit->user_fence); - dma_fence_put(submit->hw_fence); + + /* + * If the submit is freed before msm_job_run(), then hw_fence is + * just some pre-allocated memory, not a reference counted fence. + * Once the job runs and the hw_fence is initialized, it will + * have a refcount of at least one, since the submit holds a ref + * to the hw_fence. + */ + if (kref_read(&submit->hw_fence->refcount) == 0) { + kfree(submit->hw_fence); + } else { + dma_fence_put(submit->hw_fence); + } put_pid(submit->pid); msm_submitqueue_put(submit->queue);