Message ID | 1302640318-23165-19-git-send-email-chris@chris-wilson.co.uk (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
> @@ -3002,6 +3002,44 @@ i915_gem_object_set_to_gtt_domain(struct drm_i915_gem_object *obj, bool write) > return 0; > } > > +int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj, > + enum i915_cache_level cache_level) > +{ > + int ret; > + > + if (obj->cache_level == cache_level) > + return 0; > + > + if (obj->gtt_space) { > + ret = i915_gem_object_flush_gpu(obj); > + if (ret) > + return ret; > + > + ret = i915_gem_gtt_bind_object(obj, cache_level); > + if (ret) > + return ret; This momentarily confused me till I've noticed that the fake agp driver does the right thing and does not re-create a dmar mapping if it already exists. So much for remembering my own code. Still, maybe extract i915_gem_gtt_rebind_object from restore_gtt_mappings and use that one here? Should make the intent clearer. > + /* Ensure that we invalidate the GPU's caches and TLBs. */ > + obj->base.read_domains &= I915_GEM_GPU_DOMAINS; I can't make sense of this. Either we really want to ensure that the gpu buffers get invalidated on next use. But then it's probably read_domains &= ~GPU_DOMAINS and would fit better grouped together with the call to object_flush_gpu (the rebind can't actually fail if the dmar mappings already exist). Or this is something else and I'm blind. > + } > + > + if (cache_level == I915_CACHE_NONE) { > + /* If we're coming from LLC cached, then we haven't > + * actually been tracking whether the data is in the > + * CPU cache or not, since we only allow one bit set > + * in obj->write_domain and have been skipping the clflushes. > + * Just set it to the CPU cache for now. > + */ > + WARN_ON(obj->base.write_domain & I915_GEM_GPU_DOMAINS); > + > + obj->base.read_domains |= I915_GEM_DOMAIN_CPU; This breaks the invariant that write_domain != 0 implies write_domain == read_domains. Yes, if nothing prefetches and we clflush in due time the caches should still be valid, but paranoid me deems that a bit fragile. Also future patches shoot down fences, so we might as well shoot down the gtt mapping completely. That seems required for the redirect gtt mappings patch, too. > + obj->base.write_domain = I915_GEM_DOMAIN_CPU; We might end up here with a write_domain == DOMAIN_GTT. Feels a tad bit unsafe. I'd prefer either a WARN_ON and push the problem out to callers or to call flush_gtt_write_domain somewhere in set_cache_level. This looks like the critical part of the whole patch series so perhaps fold the follow-up patches in here, too (like fence teardown). This way there's just one spot that requires _really_ careful thinking. Also, I haven't thought too hard about the uncached->cached transition on live objects, which is not (yet) required. Maybe some more careful handling of the gtt domain (mappings teardown) is needed for that. -Daniel
On Wed, 13 Apr 2011 20:59:46 +0200, Daniel Vetter <daniel@ffwll.ch> wrote: > > @@ -3002,6 +3002,44 @@ i915_gem_object_set_to_gtt_domain(struct drm_i915_gem_object *obj, bool write) > > return 0; > > } > > > > +int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj, > > + enum i915_cache_level cache_level) > > +{ > > + int ret; > > + > > + if (obj->cache_level == cache_level) > > + return 0; > > + > > + if (obj->gtt_space) { > > + ret = i915_gem_object_flush_gpu(obj); > > + if (ret) > > + return ret; > > + > > + ret = i915_gem_gtt_bind_object(obj, cache_level); > > + if (ret) > > + return ret; > > This momentarily confused me till I've noticed that the fake agp driver > does the right thing and does not re-create a dmar mapping if it already > exists. So much for remembering my own code. Still, maybe extract > i915_gem_gtt_rebind_object from restore_gtt_mappings and use that one > here? Should make the intent clearer. Now that you reminded me, I was going to ask you at one point if we can move the construction of the sg to a separate function. I'm not completely happy that we have the sanest of interfaces between the gtt driver and i915 yet. I think it will be worth revisiting that as our usage patterns change. [The suggested change is good, of course.] > > > + /* Ensure that we invalidate the GPU's caches and TLBs. */ > > + obj->base.read_domains &= I915_GEM_GPU_DOMAINS; > > I can't make sense of this. Either we really want to ensure that the gpu > buffers get invalidated on next use. But then it's probably > > read_domains &= ~GPU_DOMAINS > > and would fit better grouped together with the call to object_flush_gpu > (the rebind can't actually fail if the dmar mappings already exist). Or > this is something else and I'm blind. Gah, typo. Even re-reading what you wrote, I thought you had gone insane. It was only me who had. ;-) > > + } > > + > > + if (cache_level == I915_CACHE_NONE) { > > + /* If we're coming from LLC cached, then we haven't > > + * actually been tracking whether the data is in the > > + * CPU cache or not, since we only allow one bit set > > + * in obj->write_domain and have been skipping the clflushes. > > + * Just set it to the CPU cache for now. > > + */ > > + WARN_ON(obj->base.write_domain & I915_GEM_GPU_DOMAINS); > > + > > + obj->base.read_domains |= I915_GEM_DOMAIN_CPU; > > This breaks the invariant that write_domain != 0 implies write_domain == > read_domains. Yes, if nothing prefetches and we clflush in due time the > caches should still be valid, but paranoid me deems that a bit fragile. > Also future patches shoot down fences, so we might as well shoot down the > gtt mapping completely. That seems required for the redirect gtt mappings > patch, too. > > > + obj->base.write_domain = I915_GEM_DOMAIN_CPU; > > We might end up here with a write_domain == DOMAIN_GTT. Feels a tad bit > unsafe. I'd prefer either a WARN_ON and push the problem out to callers or > to call flush_gtt_write_domain somewhere in set_cache_level. Right, we can simply flush the GTT write domain and do a complete move into the CPU domain: if (cache_level == I915_CACHE_NONE) { /* If we're coming from LLC cached, then we haven't * actually been tracking whether the data is in the * CPU cache or not, since we only allow one bit set * in obj->write_domain and have been skipping the * clflushes. * Just set it to the CPU cache for now. */ WARN_ON(obj->base.write_domain & I915_GEM_GPU_DOMAINS); WARN_ON(obj->base.read_domains & I915_GEM_GPU_DOMAINS); i915_gem_object_flush_gtt_write_domain(obj); obj->base.read_domains = I915_GEM_DOMAIN_CPU; obj->base.write_domain = I915_GEM_DOMAIN_CPU; } That should be a no-op and cause no greater impact than having to trigger the clflushes. And keeps the domain tracking consistent. However, considering the unflushed GTT bug, it does become much simpler... > This looks like the critical part of the whole patch series so perhaps > fold the follow-up patches in here, too (like fence teardown). This way > there's just one spot that requires _really_ careful thinking. > > Also, I haven't thought too hard about the uncached->cached transition on > live objects, which is not (yet) required. Maybe some more careful > handling of the gtt domain (mappings teardown) is needed for that. Right, we've already identified that we have a bug here entirely due to not flushing the GTT! -Chris
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 2536334..2f45228 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -1196,9 +1196,13 @@ void i915_gem_release(struct drm_device *dev, struct drm_file *file); uint32_t i915_gem_get_unfenced_gtt_alignment(struct drm_i915_gem_object *obj); +int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj, + enum i915_cache_level cache_level); + /* i915_gem_gtt.c */ void i915_gem_restore_gtt_mappings(struct drm_device *dev); -int __must_check i915_gem_gtt_bind_object(struct drm_i915_gem_object *obj); +int __must_check i915_gem_gtt_bind_object(struct drm_i915_gem_object *obj, + enum i915_cache_level cache_level); void i915_gem_gtt_unbind_object(struct drm_i915_gem_object *obj); /* i915_gem_evict.c */ diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index fa483d8..9027ee4 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -2831,7 +2831,7 @@ i915_gem_object_bind_to_gtt(struct drm_i915_gem_object *obj, return ret; } - ret = i915_gem_gtt_bind_object(obj); + ret = i915_gem_gtt_bind_object(obj, obj->cache_level); if (ret) { i915_gem_object_put_pages_gtt(obj); drm_mm_put_block(obj->gtt_space); @@ -3002,6 +3002,44 @@ i915_gem_object_set_to_gtt_domain(struct drm_i915_gem_object *obj, bool write) return 0; } +int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj, + enum i915_cache_level cache_level) +{ + int ret; + + if (obj->cache_level == cache_level) + return 0; + + if (obj->gtt_space) { + ret = i915_gem_object_flush_gpu(obj); + if (ret) + return ret; + + ret = i915_gem_gtt_bind_object(obj, cache_level); + if (ret) + return ret; + + /* Ensure that we invalidate the GPU's caches and TLBs. */ + obj->base.read_domains &= I915_GEM_GPU_DOMAINS; + } + + if (cache_level == I915_CACHE_NONE) { + /* If we're coming from LLC cached, then we haven't + * actually been tracking whether the data is in the + * CPU cache or not, since we only allow one bit set + * in obj->write_domain and have been skipping the clflushes. + * Just set it to the CPU cache for now. + */ + WARN_ON(obj->base.write_domain & I915_GEM_GPU_DOMAINS); + + obj->base.read_domains |= I915_GEM_DOMAIN_CPU; + obj->base.write_domain = I915_GEM_DOMAIN_CPU; + } + + obj->cache_level = cache_level; + return 0; +} + /* * Prepare buffer for display plane. Use uninterruptible for possible flush * wait, as in modesetting process we're not supposed to be interrupted. diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c index 2a1f8f1..6505617 100644 --- a/drivers/gpu/drm/i915/i915_gem_gtt.c +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c @@ -77,7 +77,8 @@ void i915_gem_restore_gtt_mappings(struct drm_device *dev) intel_gtt_chipset_flush(); } -int i915_gem_gtt_bind_object(struct drm_i915_gem_object *obj) +int i915_gem_gtt_bind_object(struct drm_i915_gem_object *obj, + enum i915_cache_level cache_level) { struct drm_device *dev = obj->base.dev; struct drm_i915_private *dev_priv = dev->dev_private; diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c index eab2565..f15d80f 100644 --- a/drivers/gpu/drm/i915/intel_ringbuffer.c +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c @@ -236,7 +236,8 @@ init_pipe_control(struct intel_ring_buffer *ring) ret = -ENOMEM; goto err; } - obj->cache_level = I915_CACHE_LLC; + + i915_gem_object_set_cache_level(obj, I915_CACHE_LLC); ret = i915_gem_object_pin(obj, 4096, true); if (ret) @@ -759,7 +760,8 @@ static int init_status_page(struct intel_ring_buffer *ring) ret = -ENOMEM; goto err; } - obj->cache_level = I915_CACHE_LLC; + + i915_gem_object_set_cache_level(obj, I915_CACHE_LLC); ret = i915_gem_object_pin(obj, 4096, true); if (ret != 0) {