Message ID | 1478486854-30715-1-git-send-email-mario.kleiner.de@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 07/11/16 11:47 AM, Mario Kleiner wrote: > External clients which import our bo's wait only > for exclusive dmabuf-fences, not on shared ones, > so attach fences on exported buffers as exclusive > ones, if the buffers get imported by an external > client. > > See discussion in thread: > https://lists.freedesktop.org/archives/dri-devel/2016-October/122370.html > > Tested on Intel iGPU + AMD Tonga dGPU as DRI3/Present > Prime render offload, and with the Tonga standalone as > primary gpu. > > v2: Add a wait for all shared fences before prime export, > as suggested by Christian Koenig. > > v3: - Mark buffer prime_exported in amdgpu_gem_prime_pin, > so we only use the exclusive fence when exporting a > bo to external clients like a separate iGPU, but not > when exporting/importing from/to ourselves as part of > regular DRI3 fd passing. > > - Propagate failure of reservation_object_wait_rcu back > to caller. > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95472 > (v1) Tested-by: Mike Lothian <mike@fireburn.co.uk> FWIW, I think the (v1) is usually added at the end of the line, not at the beginning. [...] > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c > index 7700dc2..c4494d0 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c > @@ -81,6 +81,20 @@ int amdgpu_gem_prime_pin(struct drm_gem_object *obj) > { > struct amdgpu_bo *bo = gem_to_amdgpu_bo(obj); > int ret = 0; > + long lret; > + > + /* > + * Wait for all shared fences to complete before we switch to future > + * use of exclusive fence on this prime_exported bo. > + */ > + lret = reservation_object_wait_timeout_rcu(bo->tbo.resv, true, false, > + MAX_SCHEDULE_TIMEOUT); > + if (unlikely(lret < 0)) { > + DRM_DEBUG_PRIME("Fence wait failed: %li\n", lret); > + return lret; > + } > + > + bo->prime_exported = true; We should probably clear bo->prime_exported in amdgpu_gem_prime_unpin. Also, I think we should set bo->prime_exported (prime_shared?) in amdgpu_gem_prime_import_sg_table as well, so that we'll also set exclusive fences on BOs imported from other GPUs.
Am 07.11.2016 um 04:29 schrieb Michel Dänzer: > On 07/11/16 11:47 AM, Mario Kleiner wrote: >> External clients which import our bo's wait only >> for exclusive dmabuf-fences, not on shared ones, >> so attach fences on exported buffers as exclusive >> ones, if the buffers get imported by an external >> client. >> >> See discussion in thread: >> https://lists.freedesktop.org/archives/dri-devel/2016-October/122370.html >> >> Tested on Intel iGPU + AMD Tonga dGPU as DRI3/Present >> Prime render offload, and with the Tonga standalone as >> primary gpu. >> >> v2: Add a wait for all shared fences before prime export, >> as suggested by Christian Koenig. >> >> v3: - Mark buffer prime_exported in amdgpu_gem_prime_pin, >> so we only use the exclusive fence when exporting a >> bo to external clients like a separate iGPU, but not >> when exporting/importing from/to ourselves as part of >> regular DRI3 fd passing. >> >> - Propagate failure of reservation_object_wait_rcu back >> to caller. >> >> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95472 >> (v1) Tested-by: Mike Lothian <mike@fireburn.co.uk> > FWIW, I think the (v1) is usually added at the end of the line, not at > the beginning. > > [...] > >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c >> index 7700dc2..c4494d0 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c >> @@ -81,6 +81,20 @@ int amdgpu_gem_prime_pin(struct drm_gem_object *obj) >> { >> struct amdgpu_bo *bo = gem_to_amdgpu_bo(obj); >> int ret = 0; >> + long lret; >> + >> + /* >> + * Wait for all shared fences to complete before we switch to future >> + * use of exclusive fence on this prime_exported bo. >> + */ >> + lret = reservation_object_wait_timeout_rcu(bo->tbo.resv, true, false, >> + MAX_SCHEDULE_TIMEOUT); >> + if (unlikely(lret < 0)) { >> + DRM_DEBUG_PRIME("Fence wait failed: %li\n", lret); >> + return lret; >> + } >> + >> + bo->prime_exported = true; > We should probably clear bo->prime_exported in amdgpu_gem_prime_unpin. > > Also, I think we should set bo->prime_exported (prime_shared?) in > amdgpu_gem_prime_import_sg_table as well, so that we'll also set > exclusive fences on BOs imported from other GPUs. Yes, really good idea. Additional to that are we sure that amdgpu_gem_prime_pin() is only called once for each GEM object? Could in theory be that somebody exports a BO to another GFX device as well as a V4L device for example. In this case we would need a counter, but I need to double check that as well. In general I would protected the flag/counter by the BO being reserved to avoid any races. In other word move those lines a bit down after the amdgpu_bo_reserve(). Regards, Christian.
On 11/07/2016 08:55 AM, Christian König wrote: > Am 07.11.2016 um 04:29 schrieb Michel Dänzer: >> On 07/11/16 11:47 AM, Mario Kleiner wrote: >>> External clients which import our bo's wait only >>> for exclusive dmabuf-fences, not on shared ones, >>> so attach fences on exported buffers as exclusive >>> ones, if the buffers get imported by an external >>> client. >>> >>> See discussion in thread: >>> https://lists.freedesktop.org/archives/dri-devel/2016-October/122370.html >>> >>> >>> Tested on Intel iGPU + AMD Tonga dGPU as DRI3/Present >>> Prime render offload, and with the Tonga standalone as >>> primary gpu. >>> >>> v2: Add a wait for all shared fences before prime export, >>> as suggested by Christian Koenig. >>> >>> v3: - Mark buffer prime_exported in amdgpu_gem_prime_pin, >>> so we only use the exclusive fence when exporting a >>> bo to external clients like a separate iGPU, but not >>> when exporting/importing from/to ourselves as part of >>> regular DRI3 fd passing. >>> >>> - Propagate failure of reservation_object_wait_rcu back >>> to caller. >>> >>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95472 >>> (v1) Tested-by: Mike Lothian <mike@fireburn.co.uk> >> FWIW, I think the (v1) is usually added at the end of the line, not at >> the beginning. Ok. >> >> [...] >> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c >>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c >>> index 7700dc2..c4494d0 100644 >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c >>> @@ -81,6 +81,20 @@ int amdgpu_gem_prime_pin(struct drm_gem_object *obj) >>> { >>> struct amdgpu_bo *bo = gem_to_amdgpu_bo(obj); >>> int ret = 0; >>> + long lret; >>> + >>> + /* >>> + * Wait for all shared fences to complete before we switch to >>> future >>> + * use of exclusive fence on this prime_exported bo. >>> + */ >>> + lret = reservation_object_wait_timeout_rcu(bo->tbo.resv, true, >>> false, >>> + MAX_SCHEDULE_TIMEOUT); >>> + if (unlikely(lret < 0)) { >>> + DRM_DEBUG_PRIME("Fence wait failed: %li\n", lret); >>> + return lret; >>> + } >>> + >>> + bo->prime_exported = true; >> We should probably clear bo->prime_exported in amdgpu_gem_prime_unpin. >> >> Also, I think we should set bo->prime_exported (prime_shared?) in >> amdgpu_gem_prime_import_sg_table as well, so that we'll also set >> exclusive fences on BOs imported from other GPUs. > Yes, really good idea. > Ok. I guess we don't need to wait for fences there before setting the flag/counter, as we can't have done anything yet with the bo just created from the imported sg_table? > Additional to that are we sure that amdgpu_gem_prime_pin() is only > called once for each GEM object? Could in theory be that somebody > exports a BO to another GFX device as well as a V4L device for example. > > In this case we would need a counter, but I need to double check that as > well. > If we want to clear the prime_exported flag in amdgpu_gem_prime_unpin() as an optimization if all clients detach from the buffer, then we need a prime_shared_counter instead of a prime_exported flag afaics. The dmabuf api allows multiple clients to import and attach to an exported dmabuf, each one calling dma_buf_attach which will call amdgpu_gem_prime_pin. > In general I would protected the flag/counter by the BO being reserved > to avoid any races. In other word move those lines a bit down after the > amdgpu_bo_reserve(). Ok. -mario > > Regards, > Christian.
Am 07.11.2016 um 20:20 schrieb Mario Kleiner: > On 11/07/2016 08:55 AM, Christian König wrote: >> Am 07.11.2016 um 04:29 schrieb Michel Dänzer: >> Also, I think we should set bo->prime_exported (prime_shared?) in >> amdgpu_gem_prime_import_sg_table as well, so that we'll also set >> exclusive fences on BOs imported from other GPUs. >> Yes, really good idea. >> > > Ok. I guess we don't need to wait for fences there before setting the > flag/counter, as we can't have done anything yet with the bo just > created from the imported sg_table? Correct, waiting at this point shouldn't be necessary. Regards, Christian.
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index 039b57e..a337d56 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -459,6 +459,7 @@ struct amdgpu_bo { u64 metadata_flags; void *metadata; u32 metadata_size; + bool prime_exported; /* list of all virtual address to which this bo * is associated to */ diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c index 651115d..51c6f60 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c @@ -132,7 +132,7 @@ static int amdgpu_bo_list_set(struct amdgpu_device *adev, entry->priority = min(info[i].bo_priority, AMDGPU_BO_LIST_MAX_PRIORITY); entry->tv.bo = &entry->robj->tbo; - entry->tv.shared = true; + entry->tv.shared = !entry->robj->prime_exported; if (entry->robj->prefered_domains == AMDGPU_GEM_DOMAIN_GDS) gds_obj = entry->robj; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c index 7700dc2..c4494d0 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c @@ -81,6 +81,20 @@ int amdgpu_gem_prime_pin(struct drm_gem_object *obj) { struct amdgpu_bo *bo = gem_to_amdgpu_bo(obj); int ret = 0; + long lret; + + /* + * Wait for all shared fences to complete before we switch to future + * use of exclusive fence on this prime_exported bo. + */ + lret = reservation_object_wait_timeout_rcu(bo->tbo.resv, true, false, + MAX_SCHEDULE_TIMEOUT); + if (unlikely(lret < 0)) { + DRM_DEBUG_PRIME("Fence wait failed: %li\n", lret); + return lret; + } + + bo->prime_exported = true; ret = amdgpu_bo_reserve(bo, false); if (unlikely(ret != 0))
External clients which import our bo's wait only for exclusive dmabuf-fences, not on shared ones, so attach fences on exported buffers as exclusive ones, if the buffers get imported by an external client. See discussion in thread: https://lists.freedesktop.org/archives/dri-devel/2016-October/122370.html Tested on Intel iGPU + AMD Tonga dGPU as DRI3/Present Prime render offload, and with the Tonga standalone as primary gpu. v2: Add a wait for all shared fences before prime export, as suggested by Christian Koenig. v3: - Mark buffer prime_exported in amdgpu_gem_prime_pin, so we only use the exclusive fence when exporting a bo to external clients like a separate iGPU, but not when exporting/importing from/to ourselves as part of regular DRI3 fd passing. - Propagate failure of reservation_object_wait_rcu back to caller. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95472 (v1) Tested-by: Mike Lothian <mike@fireburn.co.uk> Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com> Cc: Christian König <christian.koenig@amd.com> Cc: Michel Dänzer <michel.daenzer@amd.com> --- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c | 14 ++++++++++++++ 3 files changed, 16 insertions(+), 1 deletion(-)