Message ID | 1456154308-9342-3-git-send-email-david.s.gordon@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
[Cc Chris as the author of the idea.] Hi, On 22/02/16 15:18, Dave Gordon wrote: > This is essentially Chris Wilson's patch of a similar name, reworked on > top of Alex Dai's recent patch: > drm/i915: Add i915_gem_object_vmap to map GEM object to virtual space > Chris' original commentary said: > > We now have two implementations for vmapping a whole object, one for > dma-buf and one for the ringbuffer. If we couple the vmapping into > the obj->pages lifetime, then we can reuse an obj->vmapping for both > and at the same time couple it into the shrinker. As a general concept my worry is that by implementing this approach we tie two unrelated concepts together. Shrinker is about backing storage (used/free pages in a machine), while vmap is about kernel address space. And then on 32-bit with its limited vmap space (128MiB, right?), it can become exhausted much sooner that the shrinker would be triggered. And we would rely on the shrinker running to free up address space as well now, right? So unless I am missing something that doesn't fit well. > > v2: Mark the failable kmalloc() as __GFP_NOWARN (vsyrjala) > v3: Call unpin_vmap from the right dmabuf unmapper > > This patch [v4] reimplements the same functionality, but now as wrappers > round the recently-introduced i915_gem_object_vmap_range() from Alex's > patch mentioned above. > > With this change, we have only one vmap() in the whole driver :) > > Signed-off-by: Dave Gordon <david.s.gordon@intel.com> > Cc: Chris Wilson <chris@chris-wilson.co.uk> > Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> > --- > drivers/gpu/drm/i915/i915_drv.h | 11 +++++---- > drivers/gpu/drm/i915/i915_gem.c | 39 +++++++++++++++++++++++++++++++ > drivers/gpu/drm/i915/i915_gem_dmabuf.c | 41 ++++----------------------------- > drivers/gpu/drm/i915/intel_ringbuffer.c | 32 ++++++++++++------------- > 4 files changed, 67 insertions(+), 56 deletions(-) > > diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h > index 8cd47c6..b237ebd 100644 > --- a/drivers/gpu/drm/i915/i915_drv.h > +++ b/drivers/gpu/drm/i915/i915_drv.h > @@ -2168,10 +2168,7 @@ struct drm_i915_gem_object { > struct scatterlist *sg; > int last; > } get_page; > - > - /* prime dma-buf support */ > - void *dma_buf_vmapping; > - int vmapping_count; > + void *vmapping; > > /** Breadcrumb of last rendering to the buffer. > * There can only be one writer, but we allow for multiple readers. > @@ -2979,6 +2976,12 @@ static inline void i915_gem_object_unpin_pages(struct drm_i915_gem_object *obj) > obj->pages_pin_count--; > } > > +void *__must_check i915_gem_object_pin_vmap(struct drm_i915_gem_object *obj); > +static inline void i915_gem_object_unpin_vmap(struct drm_i915_gem_object *obj) > +{ > + i915_gem_object_unpin_pages(obj); > +} > + > void *__must_check i915_gem_object_vmap_range(struct drm_i915_gem_object *obj, > unsigned int first, > unsigned int npages); > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c > index 0d5709a..14942cf 100644 > --- a/drivers/gpu/drm/i915/i915_gem.c > +++ b/drivers/gpu/drm/i915/i915_gem.c > @@ -2235,6 +2235,11 @@ static void i915_gem_object_free_mmap_offset(struct drm_i915_gem_object *obj) > ops->put_pages(obj); > obj->pages = NULL; > > + if (obj->vmapping) { > + vunmap(obj->vmapping); > + obj->vmapping = NULL; > + } > + > i915_gem_object_invalidate(obj); > > return 0; > @@ -2447,6 +2452,40 @@ void *i915_gem_object_vmap_range(struct drm_i915_gem_object *obj, > return addr; > } > > +/** > + * i915_gem_object_pin_vmap - pin a GEM object and map it into kernel space > + * @obj: the GEM object to be mapped > + * > + * Conbines the functions of get_pages(), pin_pages() and vmap_range() on > + * the whole object. The caller should release the mappoing by calling > + * i915_gem_object_unpin_vmap() when the object is no longer required. > + * function. "Conbines", "mappoing" and a dangling "function." sentence :) "when the mapping is no longer required." probably would be better. Return value also needs to be documented. > + * > + * Returns an ERR_PTR on failure. > + */ > +void *i915_gem_object_pin_vmap(struct drm_i915_gem_object *obj) > +{ > + int ret; > + > + ret = i915_gem_object_get_pages(obj); > + if (ret) > + return ERR_PTR(ret); > + > + i915_gem_object_pin_pages(obj); > + > + if (obj->vmapping == NULL) { > + obj->vmapping = i915_gem_object_vmap_range(obj, 0, > + obj->base.size >> PAGE_SHIFT); > + > + if (obj->vmapping == NULL) { > + i915_gem_object_unpin_pages(obj); > + return ERR_PTR(-ENOMEM); > + } > + } > + > + return obj->vmapping; > +} > + > void i915_vma_move_to_active(struct i915_vma *vma, > struct drm_i915_gem_request *req) > { > diff --git a/drivers/gpu/drm/i915/i915_gem_dmabuf.c b/drivers/gpu/drm/i915/i915_gem_dmabuf.c > index aee4149..adc7b5e 100644 > --- a/drivers/gpu/drm/i915/i915_gem_dmabuf.c > +++ b/drivers/gpu/drm/i915/i915_gem_dmabuf.c > @@ -95,14 +95,12 @@ static void i915_gem_unmap_dma_buf(struct dma_buf_attachment *attachment, > { > struct drm_i915_gem_object *obj = dma_buf_to_obj(attachment->dmabuf); > > - mutex_lock(&obj->base.dev->struct_mutex); > - > dma_unmap_sg(attachment->dev, sg->sgl, sg->nents, dir); > sg_free_table(sg); > kfree(sg); > > + mutex_lock(&obj->base.dev->struct_mutex); > i915_gem_object_unpin_pages(obj); > - > mutex_unlock(&obj->base.dev->struct_mutex); > } Unrelated hunk like this would better fit a separate patch so it can be explained why it is OK, or better. I would suggest dropping it from this patch. > @@ -110,41 +108,17 @@ static void *i915_gem_dmabuf_vmap(struct dma_buf *dma_buf) > { > struct drm_i915_gem_object *obj = dma_buf_to_obj(dma_buf); > struct drm_device *dev = obj->base.dev; > + void *addr; > int ret; > > ret = i915_mutex_lock_interruptible(dev); > if (ret) > return ERR_PTR(ret); > > - if (obj->dma_buf_vmapping) { > - obj->vmapping_count++; > - goto out_unlock; > - } > - > - ret = i915_gem_object_get_pages(obj); > - if (ret) > - goto err; > - > - i915_gem_object_pin_pages(obj); > - > - ret = -ENOMEM; > - > - obj->dma_buf_vmapping = i915_gem_object_vmap_range(obj, 0, > - dma_buf->size >> PAGE_SHIFT); > - > - if (!obj->dma_buf_vmapping) > - goto err_unpin; > - > - obj->vmapping_count = 1; > -out_unlock: > + addr = i915_gem_object_pin_vmap(obj); > mutex_unlock(&dev->struct_mutex); > - return obj->dma_buf_vmapping; > > -err_unpin: > - i915_gem_object_unpin_pages(obj); > -err: > - mutex_unlock(&dev->struct_mutex); > - return ERR_PTR(ret); > + return addr; > } > > static void i915_gem_dmabuf_vunmap(struct dma_buf *dma_buf, void *vaddr) > @@ -153,12 +127,7 @@ static void i915_gem_dmabuf_vunmap(struct dma_buf *dma_buf, void *vaddr) > struct drm_device *dev = obj->base.dev; > > mutex_lock(&dev->struct_mutex); > - if (--obj->vmapping_count == 0) { > - vunmap(obj->dma_buf_vmapping); > - obj->dma_buf_vmapping = NULL; > - > - i915_gem_object_unpin_pages(obj); > - } > + i915_gem_object_unpin_vmap(obj); > mutex_unlock(&dev->struct_mutex); > } > > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c > index 434a452..47f186e 100644 > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c > @@ -2056,7 +2056,7 @@ static int init_phys_status_page(struct intel_engine_cs *ring) > void intel_unpin_ringbuffer_obj(struct intel_ringbuffer *ringbuf) > { > if (HAS_LLC(ringbuf->obj->base.dev) && !ringbuf->obj->stolen) > - vunmap(ringbuf->virtual_start); > + i915_gem_object_unpin_vmap(ringbuf->obj); > else > iounmap(ringbuf->virtual_start); > ringbuf->virtual_start = NULL; > @@ -2077,16 +2077,14 @@ int intel_pin_and_map_ringbuffer_obj(struct drm_device *dev, > return ret; > > ret = i915_gem_object_set_to_cpu_domain(obj, true); > - if (ret) { > - i915_gem_object_ggtt_unpin(obj); > - return ret; > - } > + if (ret) > + goto unpin; > > - ringbuf->virtual_start = i915_gem_object_vmap_range(obj, 0, > - ringbuf->size >> PAGE_SHIFT); > - if (ringbuf->virtual_start == NULL) { > - i915_gem_object_ggtt_unpin(obj); > - return -ENOMEM; > + ringbuf->virtual_start = i915_gem_object_pin_vmap(obj); > + if (IS_ERR(ringbuf->virtual_start)) { > + ret = PTR_ERR(ringbuf->virtual_start); > + ringbuf->virtual_start = NULL; > + goto unpin; > } > } else { > ret = i915_gem_obj_ggtt_pin(obj, PAGE_SIZE, PIN_MAPPABLE); > @@ -2094,10 +2092,8 @@ int intel_pin_and_map_ringbuffer_obj(struct drm_device *dev, > return ret; > > ret = i915_gem_object_set_to_gtt_domain(obj, true); > - if (ret) { > - i915_gem_object_ggtt_unpin(obj); > - return ret; > - } > + if (ret) > + goto unpin; > > /* Access through the GTT requires the device to be awake. */ > assert_rpm_wakelock_held(dev_priv); > @@ -2105,14 +2101,18 @@ int intel_pin_and_map_ringbuffer_obj(struct drm_device *dev, > ringbuf->virtual_start = ioremap_wc(dev_priv->gtt.mappable_base + > i915_gem_obj_ggtt_offset(obj), ringbuf->size); > if (ringbuf->virtual_start == NULL) { > - i915_gem_object_ggtt_unpin(obj); > - return -EINVAL; > + ret = -ENOMEM; > + goto unpin; Another refactoring not really belonging to this patch. I am not sure it is that good to share the cleanup path from the two logically split branches. May be fragile in the future. But it is short enough so OK. But as a related question, I wonder why doesn't the LRC require the same !HAS_LLC approach when mapping as ring buffer does here? > } > } > > ringbuf->vma = i915_gem_obj_to_ggtt(obj); > > return 0; > + > +unpin: > + i915_gem_object_ggtt_unpin(obj); > + return ret; > } > > static void intel_destroy_ringbuffer_obj(struct intel_ringbuffer *ringbuf) > Regards, Tvrtko
On Mon, Feb 22, 2016 at 04:06:39PM +0000, Tvrtko Ursulin wrote: > > [Cc Chris as the author of the idea.] > > Hi, > > On 22/02/16 15:18, Dave Gordon wrote: > >This is essentially Chris Wilson's patch of a similar name, reworked on > >top of Alex Dai's recent patch: > > drm/i915: Add i915_gem_object_vmap to map GEM object to virtual space > >Chris' original commentary said: > > > > We now have two implementations for vmapping a whole object, one for > > dma-buf and one for the ringbuffer. If we couple the vmapping into > > the obj->pages lifetime, then we can reuse an obj->vmapping for both > > and at the same time couple it into the shrinker. > > As a general concept my worry is that by implementing this approach > we tie two unrelated concepts together. > > Shrinker is about backing storage (used/free pages in a machine), > while vmap is about kernel address space. And then on 32-bit with > its limited vmap space (128MiB, right?), it can become exhausted > much sooner that the shrinker would be triggered. And we would rely > on the shrinker running to free up address space as well now, right? Yes, we use the shrinker to free address space. > So unless I am missing something that doesn't fit well. The opposite. Even more reason for the shrinker to be able to recover vmap space on 32bit systems (for external users, not just ourselves). > >diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c > >index 434a452..47f186e 100644 > >--- a/drivers/gpu/drm/i915/intel_ringbuffer.c > >+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c > >@@ -2056,7 +2056,7 @@ static int init_phys_status_page(struct intel_engine_cs *ring) > > void intel_unpin_ringbuffer_obj(struct intel_ringbuffer *ringbuf) > > { > > if (HAS_LLC(ringbuf->obj->base.dev) && !ringbuf->obj->stolen) > >- vunmap(ringbuf->virtual_start); > >+ i915_gem_object_unpin_vmap(ringbuf->obj); > > else > > iounmap(ringbuf->virtual_start); > > ringbuf->virtual_start = NULL; > >@@ -2077,16 +2077,14 @@ int intel_pin_and_map_ringbuffer_obj(struct drm_device *dev, > > return ret; > > > > ret = i915_gem_object_set_to_cpu_domain(obj, true); > >- if (ret) { > >- i915_gem_object_ggtt_unpin(obj); > >- return ret; > >- } > >+ if (ret) > >+ goto unpin; > > > >- ringbuf->virtual_start = i915_gem_object_vmap_range(obj, 0, > >- ringbuf->size >> PAGE_SHIFT); > >- if (ringbuf->virtual_start == NULL) { > >- i915_gem_object_ggtt_unpin(obj); > >- return -ENOMEM; > >+ ringbuf->virtual_start = i915_gem_object_pin_vmap(obj); > >+ if (IS_ERR(ringbuf->virtual_start)) { > >+ ret = PTR_ERR(ringbuf->virtual_start); > >+ ringbuf->virtual_start = NULL; > >+ goto unpin; > > } > > } else { > > ret = i915_gem_obj_ggtt_pin(obj, PAGE_SIZE, PIN_MAPPABLE); > >@@ -2094,10 +2092,8 @@ int intel_pin_and_map_ringbuffer_obj(struct drm_device *dev, > > return ret; > > > > ret = i915_gem_object_set_to_gtt_domain(obj, true); > >- if (ret) { > >- i915_gem_object_ggtt_unpin(obj); > >- return ret; > >- } > >+ if (ret) > >+ goto unpin; > > > > /* Access through the GTT requires the device to be awake. */ > > assert_rpm_wakelock_held(dev_priv); > >@@ -2105,14 +2101,18 @@ int intel_pin_and_map_ringbuffer_obj(struct drm_device *dev, > > ringbuf->virtual_start = ioremap_wc(dev_priv->gtt.mappable_base + > > i915_gem_obj_ggtt_offset(obj), ringbuf->size); > > if (ringbuf->virtual_start == NULL) { > >- i915_gem_object_ggtt_unpin(obj); > >- return -EINVAL; > >+ ret = -ENOMEM; > >+ goto unpin; > > Another refactoring not really belonging to this patch. I am not > sure it is that good to share the cleanup path from the two > logically split branches. May be fragile in the future. But it is > short enough so OK. > > But as a related question, I wonder why doesn't the LRC require the > same !HAS_LLC approach when mapping as ring buffer does here? We don't try to use stolen for LRC. The main difficulty lies in deciding how to map the state object, stolen forces us to use an ioremapping through the GTT and so only suitable for write-only mappings. However, we could be using the per-context HWS, for which we want a CPU accessible, direct pointer. -Chris
On 23/02/16 10:06, Chris Wilson wrote: > On Mon, Feb 22, 2016 at 04:06:39PM +0000, Tvrtko Ursulin wrote: >> >> [Cc Chris as the author of the idea.] >> >> Hi, >> >> On 22/02/16 15:18, Dave Gordon wrote: >>> This is essentially Chris Wilson's patch of a similar name, reworked on >>> top of Alex Dai's recent patch: >>> drm/i915: Add i915_gem_object_vmap to map GEM object to virtual space >>> Chris' original commentary said: >>> >>> We now have two implementations for vmapping a whole object, one for >>> dma-buf and one for the ringbuffer. If we couple the vmapping into >>> the obj->pages lifetime, then we can reuse an obj->vmapping for both >>> and at the same time couple it into the shrinker. >> >> As a general concept my worry is that by implementing this approach >> we tie two unrelated concepts together. >> >> Shrinker is about backing storage (used/free pages in a machine), >> while vmap is about kernel address space. And then on 32-bit with >> its limited vmap space (128MiB, right?), it can become exhausted >> much sooner that the shrinker would be triggered. And we would rely >> on the shrinker running to free up address space as well now, right? > > Yes, we use the shrinker to free address space. > >> So unless I am missing something that doesn't fit well. > > The opposite. Even more reason for the shrinker to be able to recover > vmap space on 32bit systems (for external users, not just ourselves). How? I don't see that failed vmapping will trigger shrinking. What will prevent i915 from accumulating objects with vmaps sticking around for too long potentially? >>> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c >>> index 434a452..47f186e 100644 >>> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c >>> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c >>> @@ -2056,7 +2056,7 @@ static int init_phys_status_page(struct intel_engine_cs *ring) >>> void intel_unpin_ringbuffer_obj(struct intel_ringbuffer *ringbuf) >>> { >>> if (HAS_LLC(ringbuf->obj->base.dev) && !ringbuf->obj->stolen) >>> - vunmap(ringbuf->virtual_start); >>> + i915_gem_object_unpin_vmap(ringbuf->obj); >>> else >>> iounmap(ringbuf->virtual_start); >>> ringbuf->virtual_start = NULL; >>> @@ -2077,16 +2077,14 @@ int intel_pin_and_map_ringbuffer_obj(struct drm_device *dev, >>> return ret; >>> >>> ret = i915_gem_object_set_to_cpu_domain(obj, true); >>> - if (ret) { >>> - i915_gem_object_ggtt_unpin(obj); >>> - return ret; >>> - } >>> + if (ret) >>> + goto unpin; >>> >>> - ringbuf->virtual_start = i915_gem_object_vmap_range(obj, 0, >>> - ringbuf->size >> PAGE_SHIFT); >>> - if (ringbuf->virtual_start == NULL) { >>> - i915_gem_object_ggtt_unpin(obj); >>> - return -ENOMEM; >>> + ringbuf->virtual_start = i915_gem_object_pin_vmap(obj); >>> + if (IS_ERR(ringbuf->virtual_start)) { >>> + ret = PTR_ERR(ringbuf->virtual_start); >>> + ringbuf->virtual_start = NULL; >>> + goto unpin; >>> } >>> } else { >>> ret = i915_gem_obj_ggtt_pin(obj, PAGE_SIZE, PIN_MAPPABLE); >>> @@ -2094,10 +2092,8 @@ int intel_pin_and_map_ringbuffer_obj(struct drm_device *dev, >>> return ret; >>> >>> ret = i915_gem_object_set_to_gtt_domain(obj, true); >>> - if (ret) { >>> - i915_gem_object_ggtt_unpin(obj); >>> - return ret; >>> - } >>> + if (ret) >>> + goto unpin; >>> >>> /* Access through the GTT requires the device to be awake. */ >>> assert_rpm_wakelock_held(dev_priv); >>> @@ -2105,14 +2101,18 @@ int intel_pin_and_map_ringbuffer_obj(struct drm_device *dev, >>> ringbuf->virtual_start = ioremap_wc(dev_priv->gtt.mappable_base + >>> i915_gem_obj_ggtt_offset(obj), ringbuf->size); >>> if (ringbuf->virtual_start == NULL) { >>> - i915_gem_object_ggtt_unpin(obj); >>> - return -EINVAL; >>> + ret = -ENOMEM; >>> + goto unpin; >> >> Another refactoring not really belonging to this patch. I am not >> sure it is that good to share the cleanup path from the two >> logically split branches. May be fragile in the future. But it is >> short enough so OK. >> >> But as a related question, I wonder why doesn't the LRC require the >> same !HAS_LLC approach when mapping as ring buffer does here? > > We don't try to use stolen for LRC. The main difficulty lies in > deciding how to map the state object, stolen forces us to use an > ioremapping through the GTT and so only suitable for write-only > mappings. However, we could be using the per-context HWS, for which we > want a CPU accessible, direct pointer. I wasn't asking about stolen but the !HAS_LLC path. Even non-stolen ring buffers will be mapped vie the aperture on !HAS_LLC platforms. That implies it is about cache coherency and we don't have the same treatment for the LRC page. Until your vma->iomap prototype which added the same uncached access to the LRC as well. So my question was, do we need this for cache considerations today, irrespective of caching the pointer in the VMA. Regards, Tvrtko
On Tue, Feb 23, 2016 at 10:31:08AM +0000, Tvrtko Ursulin wrote: > > On 23/02/16 10:06, Chris Wilson wrote: > >On Mon, Feb 22, 2016 at 04:06:39PM +0000, Tvrtko Ursulin wrote: > >> > >>[Cc Chris as the author of the idea.] > >> > >>Hi, > >> > >>On 22/02/16 15:18, Dave Gordon wrote: > >>>This is essentially Chris Wilson's patch of a similar name, reworked on > >>>top of Alex Dai's recent patch: > >>> drm/i915: Add i915_gem_object_vmap to map GEM object to virtual space > >>>Chris' original commentary said: > >>> > >>> We now have two implementations for vmapping a whole object, one for > >>> dma-buf and one for the ringbuffer. If we couple the vmapping into > >>> the obj->pages lifetime, then we can reuse an obj->vmapping for both > >>> and at the same time couple it into the shrinker. > >> > >>As a general concept my worry is that by implementing this approach > >>we tie two unrelated concepts together. > >> > >>Shrinker is about backing storage (used/free pages in a machine), > >>while vmap is about kernel address space. And then on 32-bit with > >>its limited vmap space (128MiB, right?), it can become exhausted > >>much sooner that the shrinker would be triggered. And we would rely > >>on the shrinker running to free up address space as well now, right? > > > >Yes, we use the shrinker to free address space. > > > >>So unless I am missing something that doesn't fit well. > > > >The opposite. Even more reason for the shrinker to be able to recover > >vmap space on 32bit systems (for external users, not just ourselves). > > How? I don't see that failed vmapping will trigger shrinking. What > will prevent i915 from accumulating objects with vmaps sticking > around for too long potentially? I read what you wrote as implying the shrinker would be run (which is what I hoped). I made the mistake of just thinking of the allocations around that path, rather than alloc_vmap_area(). Indeed, we would need a new notifier, pretty much for the sole use of 32b. Grr, that will be a nuisance. > >>But as a related question, I wonder why doesn't the LRC require the > >>same !HAS_LLC approach when mapping as ring buffer does here? > > > >We don't try to use stolen for LRC. The main difficulty lies in > >deciding how to map the state object, stolen forces us to use an > >ioremapping through the GTT and so only suitable for write-only > >mappings. However, we could be using the per-context HWS, for which we > >want a CPU accessible, direct pointer. > > I wasn't asking about stolen but the !HAS_LLC path. Even non-stolen > ring buffers will be mapped vie the aperture on !HAS_LLC platforms. > That implies it is about cache coherency and we don't have the same > treatment for the LRC page. Oh, completely missed the point. Yes, I also wonder how the kmap() on the context object works for Braswell without an explicit clflush of at least the TAIL update between requests. > Until your vma->iomap prototype which added the same uncached access > to the LRC as well. > > So my question was, do we need this for cache considerations today, > irrespective of caching the pointer in the VMA. Yes, I think so, but it's hard to say as Braswell breaks in so many different ways when the littlest of stress is applied. -Chris
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 8cd47c6..b237ebd 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -2168,10 +2168,7 @@ struct drm_i915_gem_object { struct scatterlist *sg; int last; } get_page; - - /* prime dma-buf support */ - void *dma_buf_vmapping; - int vmapping_count; + void *vmapping; /** Breadcrumb of last rendering to the buffer. * There can only be one writer, but we allow for multiple readers. @@ -2979,6 +2976,12 @@ static inline void i915_gem_object_unpin_pages(struct drm_i915_gem_object *obj) obj->pages_pin_count--; } +void *__must_check i915_gem_object_pin_vmap(struct drm_i915_gem_object *obj); +static inline void i915_gem_object_unpin_vmap(struct drm_i915_gem_object *obj) +{ + i915_gem_object_unpin_pages(obj); +} + void *__must_check i915_gem_object_vmap_range(struct drm_i915_gem_object *obj, unsigned int first, unsigned int npages); diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 0d5709a..14942cf 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -2235,6 +2235,11 @@ static void i915_gem_object_free_mmap_offset(struct drm_i915_gem_object *obj) ops->put_pages(obj); obj->pages = NULL; + if (obj->vmapping) { + vunmap(obj->vmapping); + obj->vmapping = NULL; + } + i915_gem_object_invalidate(obj); return 0; @@ -2447,6 +2452,40 @@ void *i915_gem_object_vmap_range(struct drm_i915_gem_object *obj, return addr; } +/** + * i915_gem_object_pin_vmap - pin a GEM object and map it into kernel space + * @obj: the GEM object to be mapped + * + * Conbines the functions of get_pages(), pin_pages() and vmap_range() on + * the whole object. The caller should release the mappoing by calling + * i915_gem_object_unpin_vmap() when the object is no longer required. + * function. + * + * Returns an ERR_PTR on failure. + */ +void *i915_gem_object_pin_vmap(struct drm_i915_gem_object *obj) +{ + int ret; + + ret = i915_gem_object_get_pages(obj); + if (ret) + return ERR_PTR(ret); + + i915_gem_object_pin_pages(obj); + + if (obj->vmapping == NULL) { + obj->vmapping = i915_gem_object_vmap_range(obj, 0, + obj->base.size >> PAGE_SHIFT); + + if (obj->vmapping == NULL) { + i915_gem_object_unpin_pages(obj); + return ERR_PTR(-ENOMEM); + } + } + + return obj->vmapping; +} + void i915_vma_move_to_active(struct i915_vma *vma, struct drm_i915_gem_request *req) { diff --git a/drivers/gpu/drm/i915/i915_gem_dmabuf.c b/drivers/gpu/drm/i915/i915_gem_dmabuf.c index aee4149..adc7b5e 100644 --- a/drivers/gpu/drm/i915/i915_gem_dmabuf.c +++ b/drivers/gpu/drm/i915/i915_gem_dmabuf.c @@ -95,14 +95,12 @@ static void i915_gem_unmap_dma_buf(struct dma_buf_attachment *attachment, { struct drm_i915_gem_object *obj = dma_buf_to_obj(attachment->dmabuf); - mutex_lock(&obj->base.dev->struct_mutex); - dma_unmap_sg(attachment->dev, sg->sgl, sg->nents, dir); sg_free_table(sg); kfree(sg); + mutex_lock(&obj->base.dev->struct_mutex); i915_gem_object_unpin_pages(obj); - mutex_unlock(&obj->base.dev->struct_mutex); } @@ -110,41 +108,17 @@ static void *i915_gem_dmabuf_vmap(struct dma_buf *dma_buf) { struct drm_i915_gem_object *obj = dma_buf_to_obj(dma_buf); struct drm_device *dev = obj->base.dev; + void *addr; int ret; ret = i915_mutex_lock_interruptible(dev); if (ret) return ERR_PTR(ret); - if (obj->dma_buf_vmapping) { - obj->vmapping_count++; - goto out_unlock; - } - - ret = i915_gem_object_get_pages(obj); - if (ret) - goto err; - - i915_gem_object_pin_pages(obj); - - ret = -ENOMEM; - - obj->dma_buf_vmapping = i915_gem_object_vmap_range(obj, 0, - dma_buf->size >> PAGE_SHIFT); - - if (!obj->dma_buf_vmapping) - goto err_unpin; - - obj->vmapping_count = 1; -out_unlock: + addr = i915_gem_object_pin_vmap(obj); mutex_unlock(&dev->struct_mutex); - return obj->dma_buf_vmapping; -err_unpin: - i915_gem_object_unpin_pages(obj); -err: - mutex_unlock(&dev->struct_mutex); - return ERR_PTR(ret); + return addr; } static void i915_gem_dmabuf_vunmap(struct dma_buf *dma_buf, void *vaddr) @@ -153,12 +127,7 @@ static void i915_gem_dmabuf_vunmap(struct dma_buf *dma_buf, void *vaddr) struct drm_device *dev = obj->base.dev; mutex_lock(&dev->struct_mutex); - if (--obj->vmapping_count == 0) { - vunmap(obj->dma_buf_vmapping); - obj->dma_buf_vmapping = NULL; - - i915_gem_object_unpin_pages(obj); - } + i915_gem_object_unpin_vmap(obj); mutex_unlock(&dev->struct_mutex); } diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c index 434a452..47f186e 100644 --- a/drivers/gpu/drm/i915/intel_ringbuffer.c +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c @@ -2056,7 +2056,7 @@ static int init_phys_status_page(struct intel_engine_cs *ring) void intel_unpin_ringbuffer_obj(struct intel_ringbuffer *ringbuf) { if (HAS_LLC(ringbuf->obj->base.dev) && !ringbuf->obj->stolen) - vunmap(ringbuf->virtual_start); + i915_gem_object_unpin_vmap(ringbuf->obj); else iounmap(ringbuf->virtual_start); ringbuf->virtual_start = NULL; @@ -2077,16 +2077,14 @@ int intel_pin_and_map_ringbuffer_obj(struct drm_device *dev, return ret; ret = i915_gem_object_set_to_cpu_domain(obj, true); - if (ret) { - i915_gem_object_ggtt_unpin(obj); - return ret; - } + if (ret) + goto unpin; - ringbuf->virtual_start = i915_gem_object_vmap_range(obj, 0, - ringbuf->size >> PAGE_SHIFT); - if (ringbuf->virtual_start == NULL) { - i915_gem_object_ggtt_unpin(obj); - return -ENOMEM; + ringbuf->virtual_start = i915_gem_object_pin_vmap(obj); + if (IS_ERR(ringbuf->virtual_start)) { + ret = PTR_ERR(ringbuf->virtual_start); + ringbuf->virtual_start = NULL; + goto unpin; } } else { ret = i915_gem_obj_ggtt_pin(obj, PAGE_SIZE, PIN_MAPPABLE); @@ -2094,10 +2092,8 @@ int intel_pin_and_map_ringbuffer_obj(struct drm_device *dev, return ret; ret = i915_gem_object_set_to_gtt_domain(obj, true); - if (ret) { - i915_gem_object_ggtt_unpin(obj); - return ret; - } + if (ret) + goto unpin; /* Access through the GTT requires the device to be awake. */ assert_rpm_wakelock_held(dev_priv); @@ -2105,14 +2101,18 @@ int intel_pin_and_map_ringbuffer_obj(struct drm_device *dev, ringbuf->virtual_start = ioremap_wc(dev_priv->gtt.mappable_base + i915_gem_obj_ggtt_offset(obj), ringbuf->size); if (ringbuf->virtual_start == NULL) { - i915_gem_object_ggtt_unpin(obj); - return -EINVAL; + ret = -ENOMEM; + goto unpin; } } ringbuf->vma = i915_gem_obj_to_ggtt(obj); return 0; + +unpin: + i915_gem_object_ggtt_unpin(obj); + return ret; } static void intel_destroy_ringbuffer_obj(struct intel_ringbuffer *ringbuf)
This is essentially Chris Wilson's patch of a similar name, reworked on top of Alex Dai's recent patch: drm/i915: Add i915_gem_object_vmap to map GEM object to virtual space Chris' original commentary said: We now have two implementations for vmapping a whole object, one for dma-buf and one for the ringbuffer. If we couple the vmapping into the obj->pages lifetime, then we can reuse an obj->vmapping for both and at the same time couple it into the shrinker. v2: Mark the failable kmalloc() as __GFP_NOWARN (vsyrjala) v3: Call unpin_vmap from the right dmabuf unmapper This patch [v4] reimplements the same functionality, but now as wrappers round the recently-introduced i915_gem_object_vmap_range() from Alex's patch mentioned above. With this change, we have only one vmap() in the whole driver :) Signed-off-by: Dave Gordon <david.s.gordon@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> --- drivers/gpu/drm/i915/i915_drv.h | 11 +++++---- drivers/gpu/drm/i915/i915_gem.c | 39 +++++++++++++++++++++++++++++++ drivers/gpu/drm/i915/i915_gem_dmabuf.c | 41 ++++----------------------------- drivers/gpu/drm/i915/intel_ringbuffer.c | 32 ++++++++++++------------- 4 files changed, 67 insertions(+), 56 deletions(-)