Message ID | 20230615024008.1600281-1-airlied@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | nouveau: fix client work fence deletion race | expand |
On Thu, Jun 15, 2023 at 4:47 AM Dave Airlie <airlied@gmail.com> wrote: > > From: Dave Airlie <airlied@redhat.com> > > This seems to have existed for ever but is now more apparant after > 9bff18d13473a9fdf81d5158248472a9d8ecf2bd (drm/ttm: use per BO cleanup workers) > > My analysis: > two threads are running, > one in the irq signalling the fence, in dma_fence_signal_timestamp_locked, > it has done the DMA_FENCE_FLAG_SIGNALLED_BIT setting, but hasn't yet reached the callbacks. > > second thread in nouveau_cli_work_ready, where it sees the fence is signalled, so then puts the > fence, cleanups the object and frees the work item, which contains the callback. > > thread one goes again and tries to call the callback and causes the use-after-free. > > Proposed fix: > lock the fence signalled check in nouveau_cli_work_ready, so either the callbacks are done > or the memory is freed. > > Signed-off-by: Dave Airlie <airlied@redhat.com> Fixes: 11e451e74050 ("drm/nouveau: remove fence wait code from deferred client work handler") Is I think the most reasonable commit to tag to ensure backports. > --- > drivers/gpu/drm/nouveau/nouveau_drm.c | 14 ++++++++++---- > 1 file changed, 10 insertions(+), 4 deletions(-) > > diff --git a/drivers/gpu/drm/nouveau/nouveau_drm.c b/drivers/gpu/drm/nouveau/nouveau_drm.c > index cc7c5b4a05fd..1a45be769848 100644 > --- a/drivers/gpu/drm/nouveau/nouveau_drm.c > +++ b/drivers/gpu/drm/nouveau/nouveau_drm.c > @@ -137,10 +137,16 @@ nouveau_name(struct drm_device *dev) > static inline bool > nouveau_cli_work_ready(struct dma_fence *fence) > { > - if (!dma_fence_is_signaled(fence)) > - return false; > - dma_fence_put(fence); > - return true; > + unsigned long flags; > + bool ret = true; > + spin_lock_irqsave(fence->lock, flags); > + if (!dma_fence_is_signaled_locked(fence)) > + ret = false; > + spin_unlock_irqrestore(fence->lock, flags); I agree with the code being always broken as `dma_fence_is_signaled_locked` specifies: "This function requires &dma_fence.lock to be held." > + > + if (ret == true) > + dma_fence_put(fence); > + return ret; > } > > static void > -- > 2.40.1 > regardless of the patch fixing the issue users have seen: Reviewed-by: Karol Herbst <kherbst@redhat.com>
Am Donnerstag, dem 15.06.2023 um 12:40 +1000 schrieb Dave Airlie: > From: Dave Airlie <airlied@redhat.com> > > This seems to have existed for ever but is now more apparant after > 9bff18d13473a9fdf81d5158248472a9d8ecf2bd (drm/ttm: use per BO cleanup workers) > > My analysis: > two threads are running, > one in the irq signalling the fence, in dma_fence_signal_timestamp_locked, > it has done the DMA_FENCE_FLAG_SIGNALLED_BIT setting, but hasn't yet reached the callbacks. > > second thread in nouveau_cli_work_ready, where it sees the fence is signalled, so then puts the > fence, cleanups the object and frees the work item, which contains the callback. > > thread one goes again and tries to call the callback and causes the use-after-free. > > Proposed fix: > lock the fence signalled check in nouveau_cli_work_ready, so either the callbacks are done > or the memory is freed. > > Signed-off-by: Dave Airlie <airlied@redhat.com> > --- > drivers/gpu/drm/nouveau/nouveau_drm.c | 14 ++++++++++---- > 1 file changed, 10 insertions(+), 4 deletions(-) > > diff --git a/drivers/gpu/drm/nouveau/nouveau_drm.c b/drivers/gpu/drm/nouveau/nouveau_drm.c > index cc7c5b4a05fd..1a45be769848 100644 > --- a/drivers/gpu/drm/nouveau/nouveau_drm.c > +++ b/drivers/gpu/drm/nouveau/nouveau_drm.c > @@ -137,10 +137,16 @@ nouveau_name(struct drm_device *dev) > static inline bool > nouveau_cli_work_ready(struct dma_fence *fence) > { > - if (!dma_fence_is_signaled(fence)) > - return false; > - dma_fence_put(fence); > - return true; > + unsigned long flags; > + bool ret = true; > + spin_lock_irqsave(fence->lock, flags); This function is only ever called from worker (process) context, so there is no point in saving the current irq state. This can be a plain spin_lock_irq. Regards, Lucas > + if (!dma_fence_is_signaled_locked(fence)) > + ret = false; > + spin_unlock_irqrestore(fence->lock, flags); > + > + if (ret == true) > + dma_fence_put(fence); > + return ret; > } > > static void
diff --git a/drivers/gpu/drm/nouveau/nouveau_drm.c b/drivers/gpu/drm/nouveau/nouveau_drm.c index cc7c5b4a05fd..1a45be769848 100644 --- a/drivers/gpu/drm/nouveau/nouveau_drm.c +++ b/drivers/gpu/drm/nouveau/nouveau_drm.c @@ -137,10 +137,16 @@ nouveau_name(struct drm_device *dev) static inline bool nouveau_cli_work_ready(struct dma_fence *fence) { - if (!dma_fence_is_signaled(fence)) - return false; - dma_fence_put(fence); - return true; + unsigned long flags; + bool ret = true; + spin_lock_irqsave(fence->lock, flags); + if (!dma_fence_is_signaled_locked(fence)) + ret = false; + spin_unlock_irqrestore(fence->lock, flags); + + if (ret == true) + dma_fence_put(fence); + return ret; } static void