Message ID | 1398730708-3278-1-git-send-email-benjamin.widawsky@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Mon, Apr 28, 2014 at 05:18:28PM -0700, Ben Widawsky wrote: > All the rest of the code to enable this is in my branch. Without my > branch, hitting > 32b offsets is impossible. The code has always > "supported" 64b, but it's never actually been run of tested. This change > doesn't actually fix anything. [1] I am not sure why X won't work yet. I > do not get hangs or obvious errors. > > There are 3 fixes grouped together here. First is to remove the > hardcoded 0 for the upper dword of the relocation. The next fix is to > use a 64b value for target_offset. The final fix is to not directly > apply target_offset to reloc->delta. reloc->delta is part of ABI, and so > we cannot change it. As it stands, 32b is enough to represent everything > we're interested in representing anyway. The main problem is, we cannot > add greater than 32b values to it directly. > > [1] Almost all of intel-gpu-tools is not yet ready to test 64b > relocations. There are a few places that expect 32b values for offsets > and these all won't work. > > Cc: Rafael Barbalho <rafael.barbalho@intel.com> > Cc: Chris Wilson <chris@chris-wilson.co.uk> > Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Seriously, we did this? I am ashamed. I was annoyed by the original assertion that no userspace was ready in the first place, and to see that the code was a complete farce anyway... Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> > --- > drivers/gpu/drm/i915/i915_gem_execbuffer.c | 23 +++++++++++++---------- > 1 file changed, 13 insertions(+), 10 deletions(-) > > diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c > index 0d806fc..6ffecd2 100644 > --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c > +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c > @@ -262,10 +262,12 @@ static inline int use_cpu_reloc(struct drm_i915_gem_object *obj) > > static int > relocate_entry_cpu(struct drm_i915_gem_object *obj, > - struct drm_i915_gem_relocation_entry *reloc) > + struct drm_i915_gem_relocation_entry *reloc, > + uint64_t target_offset) > { > struct drm_device *dev = obj->base.dev; > uint32_t page_offset = offset_in_page(reloc->offset); > + uint64_t delta = reloc->delta + target_offset; I would not have called the final value delta, but target_offset. I was going to quible over the use of a local variable instead of reloc->delta, but you successfully argued in my head that your way was less obtuse. -Chris
On Thu, May 01, 2014 at 09:04:50AM +0100, Chris Wilson wrote: > On Mon, Apr 28, 2014 at 05:18:28PM -0700, Ben Widawsky wrote: > > All the rest of the code to enable this is in my branch. Without my > > branch, hitting > 32b offsets is impossible. The code has always > > "supported" 64b, but it's never actually been run of tested. This change > > doesn't actually fix anything. [1] I am not sure why X won't work yet. I > > do not get hangs or obvious errors. > > > > There are 3 fixes grouped together here. First is to remove the > > hardcoded 0 for the upper dword of the relocation. The next fix is to > > use a 64b value for target_offset. The final fix is to not directly > > apply target_offset to reloc->delta. reloc->delta is part of ABI, and so > > we cannot change it. As it stands, 32b is enough to represent everything > > we're interested in representing anyway. The main problem is, we cannot > > add greater than 32b values to it directly. Imo if you have a target_offset > 32b in a valid use-case we can bother to look at this. But not before, since I expect that hw advances will make this obsolete anyway. > > [1] Almost all of intel-gpu-tools is not yet ready to test 64b > > relocations. There are a few places that expect 32b values for offsets > > and these all won't work. > > > > Cc: Rafael Barbalho <rafael.barbalho@intel.com> > > Cc: Chris Wilson <chris@chris-wilson.co.uk> > > Signed-off-by: Ben Widawsky <ben@bwidawsk.net> > > Seriously, we did this? I am ashamed. I was annoyed by the original > assertion that no userspace was ready in the first place, and to see > that the code was a complete farce anyway... Well my idea was that we try to prep userspace to avoid a needless abi rev, but it was always clear to me that the kernel side (and igt) is hopelessly broken for 64b relocs. > Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Queued for -next, thanks for the patch. -Daniel
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c index 0d806fc..6ffecd2 100644 --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c @@ -262,10 +262,12 @@ static inline int use_cpu_reloc(struct drm_i915_gem_object *obj) static int relocate_entry_cpu(struct drm_i915_gem_object *obj, - struct drm_i915_gem_relocation_entry *reloc) + struct drm_i915_gem_relocation_entry *reloc, + uint64_t target_offset) { struct drm_device *dev = obj->base.dev; uint32_t page_offset = offset_in_page(reloc->offset); + uint64_t delta = reloc->delta + target_offset; char *vaddr; int ret; @@ -275,7 +277,7 @@ relocate_entry_cpu(struct drm_i915_gem_object *obj, vaddr = kmap_atomic(i915_gem_object_get_page(obj, reloc->offset >> PAGE_SHIFT)); - *(uint32_t *)(vaddr + page_offset) = reloc->delta; + *(uint32_t *)(vaddr + page_offset) = lower_32_bits(delta); if (INTEL_INFO(dev)->gen >= 8) { page_offset = offset_in_page(page_offset + sizeof(uint32_t)); @@ -286,7 +288,7 @@ relocate_entry_cpu(struct drm_i915_gem_object *obj, (reloc->offset + sizeof(uint32_t)) >> PAGE_SHIFT)); } - *(uint32_t *)(vaddr + page_offset) = 0; + *(uint32_t *)(vaddr + page_offset) = upper_32_bits(delta); } kunmap_atomic(vaddr); @@ -296,10 +298,12 @@ relocate_entry_cpu(struct drm_i915_gem_object *obj, static int relocate_entry_gtt(struct drm_i915_gem_object *obj, - struct drm_i915_gem_relocation_entry *reloc) + struct drm_i915_gem_relocation_entry *reloc, + uint64_t target_offset) { struct drm_device *dev = obj->base.dev; struct drm_i915_private *dev_priv = dev->dev_private; + uint64_t delta = reloc->delta + target_offset; uint32_t __iomem *reloc_entry; void __iomem *reloc_page; int ret; @@ -318,7 +322,7 @@ relocate_entry_gtt(struct drm_i915_gem_object *obj, reloc->offset & PAGE_MASK); reloc_entry = (uint32_t __iomem *) (reloc_page + offset_in_page(reloc->offset)); - iowrite32(reloc->delta, reloc_entry); + iowrite32(lower_32_bits(delta), reloc_entry); if (INTEL_INFO(dev)->gen >= 8) { reloc_entry += 1; @@ -331,7 +335,7 @@ relocate_entry_gtt(struct drm_i915_gem_object *obj, reloc_entry = reloc_page; } - iowrite32(0, reloc_entry); + iowrite32(upper_32_bits(delta), reloc_entry); } io_mapping_unmap_atomic(reloc_page); @@ -348,7 +352,7 @@ i915_gem_execbuffer_relocate_entry(struct drm_i915_gem_object *obj, struct drm_gem_object *target_obj; struct drm_i915_gem_object *target_i915_obj; struct i915_vma *target_vma; - uint32_t target_offset; + uint64_t target_offset; int ret; /* we've already hold a reference to all valid objects */ @@ -427,11 +431,10 @@ i915_gem_execbuffer_relocate_entry(struct drm_i915_gem_object *obj, if (obj->active && in_atomic()) return -EFAULT; - reloc->delta += target_offset; if (use_cpu_reloc(obj)) - ret = relocate_entry_cpu(obj, reloc); + ret = relocate_entry_cpu(obj, reloc, target_offset); else - ret = relocate_entry_gtt(obj, reloc); + ret = relocate_entry_gtt(obj, reloc, target_offset); if (ret) return ret;
All the rest of the code to enable this is in my branch. Without my branch, hitting > 32b offsets is impossible. The code has always "supported" 64b, but it's never actually been run of tested. This change doesn't actually fix anything. [1] I am not sure why X won't work yet. I do not get hangs or obvious errors. There are 3 fixes grouped together here. First is to remove the hardcoded 0 for the upper dword of the relocation. The next fix is to use a 64b value for target_offset. The final fix is to not directly apply target_offset to reloc->delta. reloc->delta is part of ABI, and so we cannot change it. As it stands, 32b is enough to represent everything we're interested in representing anyway. The main problem is, we cannot add greater than 32b values to it directly. [1] Almost all of intel-gpu-tools is not yet ready to test 64b relocations. There are a few places that expect 32b values for offsets and these all won't work. Cc: Rafael Barbalho <rafael.barbalho@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Ben Widawsky <ben@bwidawsk.net> --- drivers/gpu/drm/i915/i915_gem_execbuffer.c | 23 +++++++++++++---------- 1 file changed, 13 insertions(+), 10 deletions(-)