Message ID | 88E16111-86C0-41BC-90F9-A0A517894B5B@amd.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [1/2] drm/ttm: Fix a deadlock if the target BO is not idle during swap | expand |
Am 03.09.21 um 08:49 schrieb Pan, Xinhui: > The ret value might be -EBUSY, caller will think lru lock is still > locked but actually NOT. So return -ENOSPC instead. Otherwise we hit > list corruption. > > ttm_bo_cleanup_refs might fail too if BO is not idle. If we return 0, > caller(ttm_tt_populate -> ttm_global_swapout ->ttm_device_swapout) will > be stuck as we actually did not free any BO memory. This usually happens > when the fence is not signaled for a long time. > > Signed-off-by: xinhui pan <xinhui.pan@amd.com> Good catch, wanted to rework this for a very long time because of potential bugs like those. Reviewed-by: Christian König <christian.koenig@amd.com> > --- > drivers/gpu/drm/ttm/ttm_bo.c | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c > index 1fedd0eb67ba..f1367107925b 100644 > --- a/drivers/gpu/drm/ttm/ttm_bo.c > +++ b/drivers/gpu/drm/ttm/ttm_bo.c > @@ -1159,9 +1159,9 @@ int ttm_bo_swapout(struct ttm_buffer_object *bo, struct ttm_operation_ctx *ctx, > } > > if (bo->deleted) { > - ttm_bo_cleanup_refs(bo, false, false, locked); > + ret = ttm_bo_cleanup_refs(bo, false, false, locked); > ttm_bo_put(bo); > - return 0; > + return ret == -EBUSY ? -ENOSPC : ret; > } > > ttm_bo_move_to_pinned(bo); > @@ -1215,7 +1215,7 @@ int ttm_bo_swapout(struct ttm_buffer_object *bo, struct ttm_operation_ctx *ctx, > if (locked) > dma_resv_unlock(bo->base.resv); > ttm_bo_put(bo); > - return ret; > + return ret == -EBUSY ? -ENOSPC : ret; > } > > void ttm_bo_tt_destroy(struct ttm_buffer_object *bo)
Pan, you're sending patches to amd-gfx-bounces@lists.freedesktop.org, which doesn't work. You need to send them to amd-gfx@lists.freedesktop.org instead.
diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index 1fedd0eb67ba..f1367107925b 100644 --- a/drivers/gpu/drm/ttm/ttm_bo.c +++ b/drivers/gpu/drm/ttm/ttm_bo.c @@ -1159,9 +1159,9 @@ int ttm_bo_swapout(struct ttm_buffer_object *bo, struct ttm_operation_ctx *ctx, } if (bo->deleted) { - ttm_bo_cleanup_refs(bo, false, false, locked); + ret = ttm_bo_cleanup_refs(bo, false, false, locked); ttm_bo_put(bo); - return 0; + return ret == -EBUSY ? -ENOSPC : ret; } ttm_bo_move_to_pinned(bo); @@ -1215,7 +1215,7 @@ int ttm_bo_swapout(struct ttm_buffer_object *bo, struct ttm_operation_ctx *ctx, if (locked) dma_resv_unlock(bo->base.resv); ttm_bo_put(bo); - return ret; + return ret == -EBUSY ? -ENOSPC : ret; } void ttm_bo_tt_destroy(struct ttm_buffer_object *bo)
The ret value might be -EBUSY, caller will think lru lock is still locked but actually NOT. So return -ENOSPC instead. Otherwise we hit list corruption. ttm_bo_cleanup_refs might fail too if BO is not idle. If we return 0, caller(ttm_tt_populate -> ttm_global_swapout ->ttm_device_swapout) will be stuck as we actually did not free any BO memory. This usually happens when the fence is not signaled for a long time. Signed-off-by: xinhui pan <xinhui.pan@amd.com> --- drivers/gpu/drm/ttm/ttm_bo.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)