Message ID | 20180626143147.14296-1-michel@daenzer.net (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Quoting Michel Dänzer (2018-06-26 15:31:47) > From: Michel Dänzer <michel.daenzer@amd.com> > > Fixes the BUG_ON spuriously triggering under the following > circumstances: > > * ttm_eu_reserve_buffers processes a list containing multiple BOs using > the same reservation object, so it calls > reservation_object_reserve_shared with that reservation object once > for each such BO. > * In reservation_object_reserve_shared, old->shared_count == > old->shared_max - 1, so obj->staged is freed in preparation of an > in-place update. > * ttm_eu_fence_buffer_objects calls reservation_object_add_shared_fence > once for each of the BOs above, always with the same fence. > * The first call adds the fence in the remaining free slot, after which > old->shared_count == old->shared_max. > > In the next call to reservation_object_add_shared_fence, the BUG_ON > triggers. However, nothing bad would happen in > reservation_object_add_shared_inplace, since the fence is already in the > reservation object. > > Prevent this by moving the BUG_ON to where an overflow would actually > happen (e.g. if a buggy caller didn't call > reservation_object_reserve_shared before). > > Cc: stable@vger.kernel.org > Signed-off-by: Michel Dänzer <michel.daenzer@amd.com> I've convinced myself (or rather have not found a valid argument against) that being able to call reserve_shared + add_shared multiple times for the same fence is an intended part of reservation_object API I'd double check with Christian though. Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> > drivers/dma-buf/reservation.c | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > diff --git a/drivers/dma-buf/reservation.c b/drivers/dma-buf/reservation.c > index 314eb1071cce..532545b9488e 100644 > --- a/drivers/dma-buf/reservation.c > +++ b/drivers/dma-buf/reservation.c > @@ -141,6 +141,7 @@ reservation_object_add_shared_inplace(struct reservation_object *obj, > if (signaled) { > RCU_INIT_POINTER(fobj->shared[signaled_idx], fence); > } else { > + BUG_ON(fobj->shared_count >= fobj->shared_max); Personally I would just let kasan detect this and throw away the BUG_ON or at least move it behind some DMABUF_BUG_ON(). -Chris
On 2018-06-27 01:50 PM, Chris Wilson wrote: > Quoting Michel Dänzer (2018-06-26 15:31:47) >> From: Michel Dänzer <michel.daenzer@amd.com> >> >> Fixes the BUG_ON spuriously triggering under the following >> circumstances: >> >> * ttm_eu_reserve_buffers processes a list containing multiple BOs using >> the same reservation object, so it calls >> reservation_object_reserve_shared with that reservation object once >> for each such BO. >> * In reservation_object_reserve_shared, old->shared_count == >> old->shared_max - 1, so obj->staged is freed in preparation of an >> in-place update. >> * ttm_eu_fence_buffer_objects calls reservation_object_add_shared_fence >> once for each of the BOs above, always with the same fence. >> * The first call adds the fence in the remaining free slot, after which >> old->shared_count == old->shared_max. >> >> In the next call to reservation_object_add_shared_fence, the BUG_ON >> triggers. However, nothing bad would happen in >> reservation_object_add_shared_inplace, since the fence is already in the >> reservation object. >> >> Prevent this by moving the BUG_ON to where an overflow would actually >> happen (e.g. if a buggy caller didn't call >> reservation_object_reserve_shared before). >> >> Cc: stable@vger.kernel.org >> Signed-off-by: Michel Dänzer <michel.daenzer@amd.com> > > I've convinced myself (or rather have not found a valid argument > against) that being able to call reserve_shared + add_shared multiple > times for the same fence is an intended part of reservation_object API > > I'd double check with Christian though. Right, I'm interested in Christian's feedback. > Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Thanks! >> drivers/dma-buf/reservation.c | 6 +++--- >> 1 file changed, 3 insertions(+), 3 deletions(-) >> >> diff --git a/drivers/dma-buf/reservation.c b/drivers/dma-buf/reservation.c >> index 314eb1071cce..532545b9488e 100644 >> --- a/drivers/dma-buf/reservation.c >> +++ b/drivers/dma-buf/reservation.c >> @@ -141,6 +141,7 @@ reservation_object_add_shared_inplace(struct reservation_object *obj, >> if (signaled) { >> RCU_INIT_POINTER(fobj->shared[signaled_idx], fence); >> } else { >> + BUG_ON(fobj->shared_count >= fobj->shared_max); > > Personally I would just let kasan detect this and throw away the BUG_ON > or at least move it behind some DMABUF_BUG_ON(). Hmm. Normally, I'm not a fan of BUG(_ON) either. But in this case, it's clear that the caller is buggy, and proceeding to write beyond the end of the array could have far-reaching consequences. I'm leaving that to somebody else.
Am 26.06.2018 um 16:31 schrieb Michel Dänzer: > From: Michel Dänzer <michel.daenzer@amd.com> > > Fixes the BUG_ON spuriously triggering under the following > circumstances: > > * ttm_eu_reserve_buffers processes a list containing multiple BOs using > the same reservation object, so it calls > reservation_object_reserve_shared with that reservation object once > for each such BO. > * In reservation_object_reserve_shared, old->shared_count == > old->shared_max - 1, so obj->staged is freed in preparation of an > in-place update. > * ttm_eu_fence_buffer_objects calls reservation_object_add_shared_fence > once for each of the BOs above, always with the same fence. > * The first call adds the fence in the remaining free slot, after which > old->shared_count == old->shared_max. Well, the explanation here is not correct. For multiple BOs using the same reservation object we won't call reservation_object_add_shared_fence() multiple times because we move those to the duplicates list in ttm_eu_reserve_buffers(). But this bug can still happen because we call reservation_object_add_shared_fence() manually with fences for the same context in a couple of places. One prominent case which comes to my mind are for the VM BOs during updates. Another possibility are VRAM BOs which need to be cleared. > > In the next call to reservation_object_add_shared_fence, the BUG_ON > triggers. However, nothing bad would happen in > reservation_object_add_shared_inplace, since the fence is already in the > reservation object. > > Prevent this by moving the BUG_ON to where an overflow would actually > happen (e.g. if a buggy caller didn't call > reservation_object_reserve_shared before). > > Cc: stable@vger.kernel.org > Signed-off-by: Michel Dänzer <michel.daenzer@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com>. Regards, Christian. > --- > drivers/dma-buf/reservation.c | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > diff --git a/drivers/dma-buf/reservation.c b/drivers/dma-buf/reservation.c > index 314eb1071cce..532545b9488e 100644 > --- a/drivers/dma-buf/reservation.c > +++ b/drivers/dma-buf/reservation.c > @@ -141,6 +141,7 @@ reservation_object_add_shared_inplace(struct reservation_object *obj, > if (signaled) { > RCU_INIT_POINTER(fobj->shared[signaled_idx], fence); > } else { > + BUG_ON(fobj->shared_count >= fobj->shared_max); > RCU_INIT_POINTER(fobj->shared[fobj->shared_count], fence); > fobj->shared_count++; > } > @@ -230,10 +231,9 @@ void reservation_object_add_shared_fence(struct reservation_object *obj, > old = reservation_object_get_list(obj); > obj->staged = NULL; > > - if (!fobj) { > - BUG_ON(old->shared_count >= old->shared_max); > + if (!fobj) > reservation_object_add_shared_inplace(obj, old, fence); > - } else > + else > reservation_object_add_shared_replace(obj, old, fobj, fence); > } > EXPORT_SYMBOL(reservation_object_add_shared_fence);
On 2018-07-04 10:31 AM, Christian König wrote: > Am 26.06.2018 um 16:31 schrieb Michel Dänzer: >> From: Michel Dänzer <michel.daenzer@amd.com> >> >> Fixes the BUG_ON spuriously triggering under the following >> circumstances: >> >> * ttm_eu_reserve_buffers processes a list containing multiple BOs using >> the same reservation object, so it calls >> reservation_object_reserve_shared with that reservation object once >> for each such BO. >> * In reservation_object_reserve_shared, old->shared_count == >> old->shared_max - 1, so obj->staged is freed in preparation of an >> in-place update. >> * ttm_eu_fence_buffer_objects calls reservation_object_add_shared_fence >> once for each of the BOs above, always with the same fence. >> * The first call adds the fence in the remaining free slot, after which >> old->shared_count == old->shared_max. > > Well, the explanation here is not correct. For multiple BOs using the > same reservation object we won't call > reservation_object_add_shared_fence() multiple times because we move > those to the duplicates list in ttm_eu_reserve_buffers(). > > But this bug can still happen because we call > reservation_object_add_shared_fence() manually with fences for the same > context in a couple of places. > > One prominent case which comes to my mind are for the VM BOs during > updates. Another possibility are VRAM BOs which need to be cleared. Thanks. How about the following: * ttm_eu_reserve_buffers calls reservation_object_reserve_shared. * In reservation_object_reserve_shared, shared_count == shared_max - 1, so obj->staged is freed in preparation of an in-place update. * ttm_eu_fence_buffer_objects calls reservation_object_add_shared_fence, after which shared_count == shared_max. * The amdgpu driver also calls reservation_object_add_shared_fence for the same reservation object, and the BUG_ON triggers. However, nothing bad would happen in reservation_object_add_shared_inplace, since all fences use the same context, so they can only occupy a single slot. Prevent this by moving the BUG_ON to where an overflow would actually happen (e.g. if a buggy caller didn't call reservation_object_reserve_shared before). Also, I'll add a reference to https://bugs.freedesktop.org/106418 in v2, as I suspect this fix is necessary under the circumstances described there as well.
Am 04.07.2018 um 11:09 schrieb Michel Dänzer: > On 2018-07-04 10:31 AM, Christian König wrote: >> Am 26.06.2018 um 16:31 schrieb Michel Dänzer: >>> From: Michel Dänzer <michel.daenzer@amd.com> >>> >>> Fixes the BUG_ON spuriously triggering under the following >>> circumstances: >>> >>> * ttm_eu_reserve_buffers processes a list containing multiple BOs using >>> the same reservation object, so it calls >>> reservation_object_reserve_shared with that reservation object once >>> for each such BO. >>> * In reservation_object_reserve_shared, old->shared_count == >>> old->shared_max - 1, so obj->staged is freed in preparation of an >>> in-place update. >>> * ttm_eu_fence_buffer_objects calls reservation_object_add_shared_fence >>> once for each of the BOs above, always with the same fence. >>> * The first call adds the fence in the remaining free slot, after which >>> old->shared_count == old->shared_max. >> Well, the explanation here is not correct. For multiple BOs using the >> same reservation object we won't call >> reservation_object_add_shared_fence() multiple times because we move >> those to the duplicates list in ttm_eu_reserve_buffers(). >> >> But this bug can still happen because we call >> reservation_object_add_shared_fence() manually with fences for the same >> context in a couple of places. >> >> One prominent case which comes to my mind are for the VM BOs during >> updates. Another possibility are VRAM BOs which need to be cleared. > Thanks. How about the following: > > * ttm_eu_reserve_buffers calls reservation_object_reserve_shared. > * In reservation_object_reserve_shared, shared_count == shared_max - 1, > so obj->staged is freed in preparation of an in-place update. > * ttm_eu_fence_buffer_objects calls reservation_object_add_shared_fence, > after which shared_count == shared_max. > * The amdgpu driver also calls reservation_object_add_shared_fence for > the same reservation object, and the BUG_ON triggers. I would rather completely drop the reference to the ttm_eu_* functions, cause those wrappers are completely unrelated to the problem. Instead let's just note something like the following: * When reservation_object_reserve_shared is called with shared_count == shared_max - 1, so obj->staged is freed in preparation of an in-place update. * Now reservation_object_add_shared_fence is called with the first fence and after that shared_count == shared_max. * After that reservation_object_add_shared_fence can be called with follow up fences from the same context, but since shared_count == shared_max we would run into this BUG_ON. > However, nothing bad would happen in > reservation_object_add_shared_inplace, since all fences use the same > context, so they can only occupy a single slot. > > Prevent this by moving the BUG_ON to where an overflow would actually > happen (e.g. if a buggy caller didn't call > reservation_object_reserve_shared before). > > > Also, I'll add a reference to https://bugs.freedesktop.org/106418 in v2, > as I suspect this fix is necessary under the circumstances described > there as well. The rest sounds good to me. Regards, Christian.
diff --git a/drivers/dma-buf/reservation.c b/drivers/dma-buf/reservation.c index 314eb1071cce..532545b9488e 100644 --- a/drivers/dma-buf/reservation.c +++ b/drivers/dma-buf/reservation.c @@ -141,6 +141,7 @@ reservation_object_add_shared_inplace(struct reservation_object *obj, if (signaled) { RCU_INIT_POINTER(fobj->shared[signaled_idx], fence); } else { + BUG_ON(fobj->shared_count >= fobj->shared_max); RCU_INIT_POINTER(fobj->shared[fobj->shared_count], fence); fobj->shared_count++; } @@ -230,10 +231,9 @@ void reservation_object_add_shared_fence(struct reservation_object *obj, old = reservation_object_get_list(obj); obj->staged = NULL; - if (!fobj) { - BUG_ON(old->shared_count >= old->shared_max); + if (!fobj) reservation_object_add_shared_inplace(obj, old, fence); - } else + else reservation_object_add_shared_replace(obj, old, fobj, fence); } EXPORT_SYMBOL(reservation_object_add_shared_fence);