Message ID | 1416854990-1920-22-git-send-email-John.C.Harrison@Intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Mon, Nov 24, 2014 at 06:49:43PM +0000, John.C.Harrison@Intel.com wrote: > From: John Harrison <John.C.Harrison@Intel.com> > > The ring member of the object structure was always updated with the > last_read_seqno member. Thus with the conversion to last_read_req, obj->ring is > now a direct copy of obj->last_read_req->ring. This makes it somewhat redundant > and potentially misleading (especially as there was no comment to explain its > purpose). > > This checkin removes the redundant field. Many uses were simply testing for > non-null to see if the object is active on the GPU. Some of these have been > converted to check 'obj->active' instead. Others (where the last_read_req is > about to be used anyway) have been changed to check obj->last_read_req. The rest > simply pull the ring out from the request structure and proceed as before. > > For: VIZ-4377 > Signed-off-by: John Harrison <John.C.Harrison@Intel.com> > Reviewed-by: Thomas Daniel <Thomas.Daniel@intel.com> Ok merged up to this for now. I'd like to settle things a bit first (and also figure out what to do with the trace_irq stuff). Thanks for patches&review, Daniel
On 26/11/2014 13:43, Daniel Vetter wrote: > On Mon, Nov 24, 2014 at 06:49:43PM +0000, John.C.Harrison@Intel.com wrote: >> From: John Harrison <John.C.Harrison@Intel.com> >> >> The ring member of the object structure was always updated with the >> last_read_seqno member. Thus with the conversion to last_read_req, obj->ring is >> now a direct copy of obj->last_read_req->ring. This makes it somewhat redundant >> and potentially misleading (especially as there was no comment to explain its >> purpose). >> >> This checkin removes the redundant field. Many uses were simply testing for >> non-null to see if the object is active on the GPU. Some of these have been >> converted to check 'obj->active' instead. Others (where the last_read_req is >> about to be used anyway) have been changed to check obj->last_read_req. The rest >> simply pull the ring out from the request structure and proceed as before. >> >> For: VIZ-4377 >> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> >> Reviewed-by: Thomas Daniel <Thomas.Daniel@intel.com> > Ok merged up to this for now. I'd like to settle things a bit first (and > also figure out what to do with the trace_irq stuff). > > Thanks for patches&review, > Daniel Now that the 3.19 pull request has gone, are you going to continue merging these patches? Or is there something else you particularly want to wait for? Thanks, John.
On Fri, Nov 28, 2014 at 05:49:26PM +0000, John Harrison wrote: > On 26/11/2014 13:43, Daniel Vetter wrote: > >On Mon, Nov 24, 2014 at 06:49:43PM +0000, John.C.Harrison@Intel.com wrote: > >>From: John Harrison <John.C.Harrison@Intel.com> > >> > >>The ring member of the object structure was always updated with the > >>last_read_seqno member. Thus with the conversion to last_read_req, obj->ring is > >>now a direct copy of obj->last_read_req->ring. This makes it somewhat redundant > >>and potentially misleading (especially as there was no comment to explain its > >>purpose). > >> > >>This checkin removes the redundant field. Many uses were simply testing for > >>non-null to see if the object is active on the GPU. Some of these have been > >>converted to check 'obj->active' instead. Others (where the last_read_req is > >>about to be used anyway) have been changed to check obj->last_read_req. The rest > >>simply pull the ring out from the request structure and proceed as before. > >> > >>For: VIZ-4377 > >>Signed-off-by: John Harrison <John.C.Harrison@Intel.com> > >>Reviewed-by: Thomas Daniel <Thomas.Daniel@intel.com> > >Ok merged up to this for now. I'd like to settle things a bit first (and > >also figure out what to do with the trace_irq stuff). > > > >Thanks for patches&review, > >Daniel > > Now that the 3.19 pull request has gone, are you going to continue merging > these patches? Or is there something else you particularly want to wait for? Well I'm still not convinced that those patches are what we want. For android (and a few other things) we really want to support struct fence, and that already provides all the necessary code to cache the completion state. So I don't see why we have to do a detour just to get caching into place if for the long-term plan we'll need to rip all that code out again anyway. And the scheduler without fence/syncpt integration doesn't make a lot of sense on Android I think. Now if there's a seriuos performance issue with requests without this then maybe this detour makes sense. But I don't see any benchmark data attached to justify the patches from that pov. -Daniel
On 28/11/2014 18:06, Daniel Vetter wrote: > On Fri, Nov 28, 2014 at 05:49:26PM +0000, John Harrison wrote: >> On 26/11/2014 13:43, Daniel Vetter wrote: >>> On Mon, Nov 24, 2014 at 06:49:43PM +0000, John.C.Harrison@Intel.com wrote: >>>> From: John Harrison <John.C.Harrison@Intel.com> >>>> >>>> The ring member of the object structure was always updated with the >>>> last_read_seqno member. Thus with the conversion to last_read_req, obj->ring is >>>> now a direct copy of obj->last_read_req->ring. This makes it somewhat redundant >>>> and potentially misleading (especially as there was no comment to explain its >>>> purpose). >>>> >>>> This checkin removes the redundant field. Many uses were simply testing for >>>> non-null to see if the object is active on the GPU. Some of these have been >>>> converted to check 'obj->active' instead. Others (where the last_read_req is >>>> about to be used anyway) have been changed to check obj->last_read_req. The rest >>>> simply pull the ring out from the request structure and proceed as before. >>>> >>>> For: VIZ-4377 >>>> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> >>>> Reviewed-by: Thomas Daniel <Thomas.Daniel@intel.com> >>> Ok merged up to this for now. I'd like to settle things a bit first (and >>> also figure out what to do with the trace_irq stuff). >>> >>> Thanks for patches&review, >>> Daniel >> Now that the 3.19 pull request has gone, are you going to continue merging >> these patches? Or is there something else you particularly want to wait for? > Well I'm still not convinced that those patches are what we want. For > android (and a few other things) we really want to support struct fence, > and that already provides all the necessary code to cache the completion > state. So I don't see why we have to do a detour just to get caching into > place if for the long-term plan we'll need to rip all that code out again > anyway. And the scheduler without fence/syncpt integration doesn't make a > lot of sense on Android I think. > > Now if there's a seriuos performance issue with requests without this then > maybe this detour makes sense. But I don't see any benchmark data attached > to justify the patches from that pov. > -Daniel Firstly, not all the missing patches are about performance caching. For example, the kmalloc vs kzalloc patch is simply better code all round. The tracing patch is also a general purpose driver improvement and makes debugging/analysing the code easier. Re the caching. My argument is that the request implementation, as it stands, is a significant step backwards over the original seqno version. In the original code, someone had evidently seen fit to make the completion test a minimal inline function. In the half merged request version, it is a function call with register reading and list traversal. As the cached version is already written, tested and reviewed, I do not see a reason to discard it in order to write a completely new version at some unspecified point in the future that may or may not be quite a large piece of work. Also, the scheduler is already ported on top of the full request patch set and is running in the GMin Android code base with native sync included. Having to rewrite the underlying code yet again is going to be a significant delay in something that needs to be shipped last month. Delaying that short term work for a long term goal is really undesirable at this point. The plan has always been to rework for long term awesomeness (such as dmabuf sync integration) after something basic is implemented and shipped. Do you have time for a phone call or meeting to discuss this? Thanks, John.
On Mon, Dec 01, 2014 at 12:44:12PM +0000, John Harrison wrote: > On 28/11/2014 18:06, Daniel Vetter wrote: > >On Fri, Nov 28, 2014 at 05:49:26PM +0000, John Harrison wrote: > >>On 26/11/2014 13:43, Daniel Vetter wrote: > >>>On Mon, Nov 24, 2014 at 06:49:43PM +0000, John.C.Harrison@Intel.com wrote: > >>>>From: John Harrison <John.C.Harrison@Intel.com> > >>>> > >>>>The ring member of the object structure was always updated with the > >>>>last_read_seqno member. Thus with the conversion to last_read_req, obj->ring is > >>>>now a direct copy of obj->last_read_req->ring. This makes it somewhat redundant > >>>>and potentially misleading (especially as there was no comment to explain its > >>>>purpose). > >>>> > >>>>This checkin removes the redundant field. Many uses were simply testing for > >>>>non-null to see if the object is active on the GPU. Some of these have been > >>>>converted to check 'obj->active' instead. Others (where the last_read_req is > >>>>about to be used anyway) have been changed to check obj->last_read_req. The rest > >>>>simply pull the ring out from the request structure and proceed as before. > >>>> > >>>>For: VIZ-4377 > >>>>Signed-off-by: John Harrison <John.C.Harrison@Intel.com> > >>>>Reviewed-by: Thomas Daniel <Thomas.Daniel@intel.com> > >>>Ok merged up to this for now. I'd like to settle things a bit first (and > >>>also figure out what to do with the trace_irq stuff). > >>> > >>>Thanks for patches&review, > >>>Daniel > >>Now that the 3.19 pull request has gone, are you going to continue merging > >>these patches? Or is there something else you particularly want to wait for? > >Well I'm still not convinced that those patches are what we want. For > >android (and a few other things) we really want to support struct fence, > >and that already provides all the necessary code to cache the completion > >state. So I don't see why we have to do a detour just to get caching into > >place if for the long-term plan we'll need to rip all that code out again > >anyway. And the scheduler without fence/syncpt integration doesn't make a > >lot of sense on Android I think. > > > >Now if there's a seriuos performance issue with requests without this then > >maybe this detour makes sense. But I don't see any benchmark data attached > >to justify the patches from that pov. > >-Daniel > > Firstly, not all the missing patches are about performance caching. For > example, the kmalloc vs kzalloc patch is simply better code all round. The > tracing patch is also a general purpose driver improvement and makes > debugging/analysing the code easier. No concern with those patches from my side except that they're after the caching change. I can cherry-pick them but it looks a bit like that might backfire if I also don't pick the request state caching. > Re the caching. My argument is that the request implementation, as it > stands, is a significant step backwards over the original seqno version. In > the original code, someone had evidently seen fit to make the completion > test a minimal inline function. In the half merged request version, it is a > function call with register reading and list traversal. As the cached > version is already written, tested and reviewed, I do not see a reason to > discard it in order to write a completely new version at some unspecified > point in the future that may or may not be quite a large piece of work. > > Also, the scheduler is already ported on top of the full request patch set > and is running in the GMin Android code base with native sync included. > Having to rewrite the underlying code yet again is going to be a significant > delay in something that needs to be shipped last month. Delaying that short > term work for a long term goal is really undesirable at this point. The plan > has always been to rework for long term awesomeness (such as dmabuf sync > integration) after something basic is implemented and shipped. > > Do you have time for a phone call or meeting to discuss this? Yeah, probably easier to discuss in a mtg. Jon Ewins should be there too I think. -Daniel
diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c index 3dc893c..b4993fc 100644 --- a/drivers/gpu/drm/i915/i915_debugfs.c +++ b/drivers/gpu/drm/i915/i915_debugfs.c @@ -168,8 +168,9 @@ describe_obj(struct seq_file *m, struct drm_i915_gem_object *obj) *t = '\0'; seq_printf(m, " (%s mappable)", s); } - if (obj->ring != NULL) - seq_printf(m, " (%s)", obj->ring->name); + if (obj->last_read_req != NULL) + seq_printf(m, " (%s)", + i915_gem_request_get_ring(obj->last_read_req)->name); if (obj->frontbuffer_bits) seq_printf(m, " (frontbuffer: 0x%03x)", obj->frontbuffer_bits); } @@ -336,7 +337,7 @@ static int per_file_stats(int id, void *ptr, void *data) if (ppgtt->file_priv != stats->file_priv) continue; - if (obj->ring) /* XXX per-vma statistic */ + if (obj->active) /* XXX per-vma statistic */ stats->active += obj->base.size; else stats->inactive += obj->base.size; @@ -346,7 +347,7 @@ static int per_file_stats(int id, void *ptr, void *data) } else { if (i915_gem_obj_ggtt_bound(obj)) { stats->global += obj->base.size; - if (obj->ring) + if (obj->active) stats->active += obj->base.size; else stats->inactive += obj->base.size; diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 1325506..36a4413 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -1941,8 +1941,6 @@ struct drm_i915_gem_object { void *dma_buf_vmapping; int vmapping_count; - struct intel_engine_cs *ring; - /** Breadcrumb of last rendering to the buffer. */ struct drm_i915_gem_request *last_read_req; struct drm_i915_gem_request *last_write_req; diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index cccca47..68c23dd 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -2263,14 +2263,18 @@ static void i915_gem_object_move_to_active(struct drm_i915_gem_object *obj, struct intel_engine_cs *ring) { - struct drm_i915_gem_request *req = intel_ring_get_request(ring); + struct drm_i915_gem_request *req; + struct intel_engine_cs *old_ring; BUG_ON(ring == NULL); - if (obj->ring != ring && obj->last_write_req) { + + req = intel_ring_get_request(ring); + old_ring = i915_gem_request_get_ring(obj->last_read_req); + + if (old_ring != ring && obj->last_write_req) { /* Keep the request relative to the current ring */ i915_gem_request_assign(&obj->last_write_req, req); } - obj->ring = ring; /* Add a reference if we're newly entering the active list. */ if (!obj->active) { @@ -2309,7 +2313,6 @@ i915_gem_object_move_to_inactive(struct drm_i915_gem_object *obj) intel_fb_obj_flush(obj, true); list_del_init(&obj->ring_list); - obj->ring = NULL; i915_gem_request_assign(&obj->last_read_req, NULL); i915_gem_request_assign(&obj->last_write_req, NULL); @@ -2326,9 +2329,7 @@ i915_gem_object_move_to_inactive(struct drm_i915_gem_object *obj) static void i915_gem_object_retire(struct drm_i915_gem_object *obj) { - struct intel_engine_cs *ring = obj->ring; - - if (ring == NULL) + if (obj->last_read_req == NULL) return; if (i915_gem_request_completed(obj->last_read_req, true)) @@ -2892,14 +2893,17 @@ i915_gem_idle_work_handler(struct work_struct *work) static int i915_gem_object_flush_active(struct drm_i915_gem_object *obj) { + struct intel_engine_cs *ring; int ret; if (obj->active) { + ring = i915_gem_request_get_ring(obj->last_read_req); + ret = i915_gem_check_olr(obj->last_read_req); if (ret) return ret; - i915_gem_retire_requests_ring(obj->ring); + i915_gem_retire_requests_ring(ring); } return 0; @@ -3002,10 +3006,12 @@ int i915_gem_object_sync(struct drm_i915_gem_object *obj, struct intel_engine_cs *to) { - struct intel_engine_cs *from = obj->ring; + struct intel_engine_cs *from; u32 seqno; int ret, idx; + from = i915_gem_request_get_ring(obj->last_read_req); + if (from == NULL || to == from) return 0; @@ -3964,7 +3970,7 @@ i915_gem_object_pin_to_display_plane(struct drm_i915_gem_object *obj, bool was_pin_display; int ret; - if (pipelined != obj->ring) { + if (pipelined != i915_gem_request_get_ring(obj->last_read_req)) { ret = i915_gem_object_sync(obj, pipelined); if (ret) return ret; @@ -4412,9 +4418,11 @@ i915_gem_busy_ioctl(struct drm_device *dev, void *data, ret = i915_gem_object_flush_active(obj); args->busy = obj->active; - if (obj->ring) { + if (obj->last_read_req) { + struct intel_engine_cs *ring; BUILD_BUG_ON(I915_NUM_RINGS > 16); - args->busy |= intel_ring_flag(obj->ring) << 16; + ring = i915_gem_request_get_ring(obj->last_read_req); + args->busy |= intel_ring_flag(ring) << 16; } drm_gem_object_unreference(&obj->base); diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c index d17ff43..3c3a9ff 100644 --- a/drivers/gpu/drm/i915/i915_gem_context.c +++ b/drivers/gpu/drm/i915/i915_gem_context.c @@ -619,7 +619,8 @@ static int do_switch(struct intel_engine_cs *ring, * swapped, but there is no way to do that yet. */ from->legacy_hw_ctx.rcs_state->dirty = 1; - BUG_ON(from->legacy_hw_ctx.rcs_state->ring != ring); + BUG_ON(i915_gem_request_get_ring( + from->legacy_hw_ctx.rcs_state->last_read_req) != ring); /* obj is kept alive until the next request by its active ref */ i915_gem_object_ggtt_unpin(from->legacy_hw_ctx.rcs_state); diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c index 998bd8a..f3b163c 100644 --- a/drivers/gpu/drm/i915/i915_gpu_error.c +++ b/drivers/gpu/drm/i915/i915_gpu_error.c @@ -685,7 +685,8 @@ static void capture_bo(struct drm_i915_error_buffer *err, err->dirty = obj->dirty; err->purgeable = obj->madv != I915_MADV_WILLNEED; err->userptr = obj->userptr.mm != NULL; - err->ring = obj->ring ? obj->ring->id : -1; + err->ring = obj->last_read_req ? + i915_gem_request_get_ring(obj->last_read_req)->id : -1; err->cache_level = obj->cache_level; } diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c index 0db41fb..1cf9c82 100644 --- a/drivers/gpu/drm/i915/intel_display.c +++ b/drivers/gpu/drm/i915/intel_display.c @@ -9459,7 +9459,7 @@ static bool use_mmio_flip(struct intel_engine_cs *ring, else if (i915.enable_execlists) return true; else - return ring != obj->ring; + return ring != i915_gem_request_get_ring(obj->last_read_req); } static void skl_do_mmio_flip(struct intel_crtc *intel_crtc) @@ -9820,7 +9820,7 @@ static int intel_crtc_page_flip(struct drm_crtc *crtc, } else if (IS_IVYBRIDGE(dev)) { ring = &dev_priv->ring[BCS]; } else if (INTEL_INFO(dev)->gen >= 7) { - ring = obj->ring; + ring = i915_gem_request_get_ring(obj->last_read_req); if (ring == NULL || ring->id != RCS) ring = &dev_priv->ring[BCS]; } else { @@ -9842,7 +9842,8 @@ static int intel_crtc_page_flip(struct drm_crtc *crtc, i915_gem_request_assign(&work->flip_queued_req, obj->last_write_req); - work->flip_queued_ring = obj->ring; + work->flip_queued_ring = + i915_gem_request_get_ring(obj->last_write_req); } else { ret = dev_priv->display.queue_flip(dev, crtc, fb, obj, ring, page_flip_flags);