Message ID | 1302640318-23165-25-git-send-email-chris@chris-wilson.co.uk (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Tue, 12 Apr 2011 21:31:52 +0100, Chris Wilson <chris@chris-wilson.co.uk> wrote: > Replace the three nearly identical copies of the code with a single > function. And take advantage of the opportunity to do some > micro-optimisation: avoid the vmalloc if at all possible and also avoid > dropping the lock unless we are forced to acquire the mm semaphore. Could we get some performance numbers in patches that add code for performance?
On Wed, 13 Apr 2011 08:59:55 -0700, Eric Anholt <eric@anholt.net> wrote: > On Tue, 12 Apr 2011 21:31:52 +0100, Chris Wilson <chris@chris-wilson.co.uk> wrote: > > Replace the three nearly identical copies of the code with a single > > function. And take advantage of the opportunity to do some > > micro-optimisation: avoid the vmalloc if at all possible and also avoid > > dropping the lock unless we are forced to acquire the mm semaphore. > > Could we get some performance numbers in patches that add code for > performance? For myself, this was justified by simply refactoring the common code. However, x11perf -aa10text on pnv: before: 1.28 Mglyph/sec after: 1.45 Mglyph/sec I have my SNB box doing a more thorough analysis of the difference for various pwrite sizes (assuming that the likelihood of faulting is not totally workload dependent.) -Chris
On Tue, Apr 12, 2011 at 09:31:52PM +0100, Chris Wilson wrote: > Replace the three nearly identical copies of the code with a single > function. And take advantage of the opportunity to do some > micro-optimisation: avoid the vmalloc if at all possible and also avoid > dropping the lock unless we are forced to acquire the mm semaphore. One tiny nitpick: Perhaps put an api comment at the top of gem_get_user_pages that this function drops the struct_mutex. That's not something we normally do and could cause endless amounts of fun if neglected. Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
On Wed, 13 Apr 2011 18:24:36 +0100, Chris Wilson <chris@chris-wilson.co.uk> wrote: > On Wed, 13 Apr 2011 08:59:55 -0700, Eric Anholt <eric@anholt.net> wrote: > > On Tue, 12 Apr 2011 21:31:52 +0100, Chris Wilson <chris@chris-wilson.co.uk> wrote: > > > Replace the three nearly identical copies of the code with a single > > > function. And take advantage of the opportunity to do some > > > micro-optimisation: avoid the vmalloc if at all possible and also avoid > > > dropping the lock unless we are forced to acquire the mm semaphore. > > > > Could we get some performance numbers in patches that add code for > > performance? > > For myself, this was justified by simply refactoring the common code. > However, x11perf -aa10text on pnv: > before: 1.28 Mglyph/sec > after: 1.45 Mglyph/sec Awesome.
On Wed, 13 Apr 2011 21:26:24 +0200, Daniel Vetter <daniel@ffwll.ch> wrote: > On Tue, Apr 12, 2011 at 09:31:52PM +0100, Chris Wilson wrote: > > Replace the three nearly identical copies of the code with a single > > function. And take advantage of the opportunity to do some > > micro-optimisation: avoid the vmalloc if at all possible and also avoid > > dropping the lock unless we are forced to acquire the mm semaphore. > > One tiny nitpick: Perhaps put an api comment at the top of > gem_get_user_pages that this function drops the struct_mutex. That's not > something we normally do and could cause endless amounts of fun if > neglected. How about: /** * Magically retrieves the pages for the user addr whilst holding the * dev->struct_mutex. * * Since we can not take the mm semaphore whilst holding our dev->struct_mutex, * due to the pre-existing lock dependency established by i915_gem_fault(), * we have to perform some sleight-of-hand. * * First, we try the lockless variant of gup whilst continuing to hold the * mutex. If that fails to get all the user pages, then we no choice but * to acquire the mm semaphore (thus dropping the lock on dev->struct_mutex * to do so). The dev->struct_mutex is then re-acquired before we return. * * Returns: an error code *and* the number of user pages acquired. Even * on an error, you must iterate over the return pages and release them. */ ? -Chris
On Wed, Apr 13, 2011 at 08:56:26PM +0100, Chris Wilson wrote: > On Wed, 13 Apr 2011 21:26:24 +0200, Daniel Vetter <daniel@ffwll.ch> wrote: > > On Tue, Apr 12, 2011 at 09:31:52PM +0100, Chris Wilson wrote: > > > Replace the three nearly identical copies of the code with a single > > > function. And take advantage of the opportunity to do some > > > micro-optimisation: avoid the vmalloc if at all possible and also avoid > > > dropping the lock unless we are forced to acquire the mm semaphore. > > > > One tiny nitpick: Perhaps put an api comment at the top of > > gem_get_user_pages that this function drops the struct_mutex. That's not > > something we normally do and could cause endless amounts of fun if > > neglected. > > How about: > > /** > * Magically retrieves the pages for the user addr whilst holding the > * dev->struct_mutex. > * > * Since we can not take the mm semaphore whilst holding our dev->struct_mutex, > * due to the pre-existing lock dependency established by i915_gem_fault(), > * we have to perform some sleight-of-hand. > * > * First, we try the lockless variant of gup whilst continuing to hold the > * mutex. If that fails to get all the user pages, then we no choice but > * to acquire the mm semaphore (thus dropping the lock on dev->struct_mutex > * to do so). The dev->struct_mutex is then re-acquired before we return. > * > * Returns: an error code *and* the number of user pages acquired. Even > * on an error, you must iterate over the return pages and release them. > */ Perfect. And just reminded me that my review wasn't too careful, I've glossed a bit over that num_pages detail ... -Daniel
On Wed, Apr 13, 2011 at 08:56:26PM +0100, Chris Wilson wrote: > On Wed, 13 Apr 2011 21:26:24 +0200, Daniel Vetter <daniel@ffwll.ch> wrote: > > On Tue, Apr 12, 2011 at 09:31:52PM +0100, Chris Wilson wrote: > > > Replace the three nearly identical copies of the code with a single > > > function. And take advantage of the opportunity to do some > > > micro-optimisation: avoid the vmalloc if at all possible and also avoid > > > dropping the lock unless we are forced to acquire the mm semaphore. > > > > One tiny nitpick: Perhaps put an api comment at the top of > > gem_get_user_pages that this function drops the struct_mutex. That's not > > something we normally do and could cause endless amounts of fun if > > neglected. > > How about: > > /** > * Magically retrieves the pages for the user addr whilst holding the > * dev->struct_mutex. > * > * Since we can not take the mm semaphore whilst holding our dev->struct_mutex, > * due to the pre-existing lock dependency established by i915_gem_fault(), > * we have to perform some sleight-of-hand. > * > * First, we try the lockless variant of gup whilst continuing to hold the > * mutex. If that fails to get all the user pages, then we no choice but > * to acquire the mm semaphore (thus dropping the lock on dev->struct_mutex > * to do so). The dev->struct_mutex is then re-acquired before we return. > * > * Returns: an error code *and* the number of user pages acquired. Even > * on an error, you must iterate over the return pages and release them. > */ > > ? > -Chris I like this patch... Reviewed-by: Ben Widawsky <ben@bwidawsk.net> > > -- > Chris Wilson, Intel Open Source Technology Centre > _______________________________________________ > Intel-gfx mailing list > Intel-gfx@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Am Donnerstag, den 14.04.2011, 16:23 -0700 schrieb Ben Widawsky: > On Wed, Apr 13, 2011 at 08:56:26PM +0100, Chris Wilson wrote: > > On Wed, 13 Apr 2011 21:26:24 +0200, Daniel Vetter <daniel@ffwll.ch> wrote: > > > On Tue, Apr 12, 2011 at 09:31:52PM +0100, Chris Wilson wrote: > > > > Replace the three nearly identical copies of the code with a single > > > > function. And take advantage of the opportunity to do some > > > > micro-optimisation: avoid the vmalloc if at all possible and also avoid > > > > dropping the lock unless we are forced to acquire the mm semaphore. > > > > > > One tiny nitpick: Perhaps put an api comment at the top of > > > gem_get_user_pages that this function drops the struct_mutex. That's not > > > something we normally do and could cause endless amounts of fun if > > > neglected. > > > > How about: > > > > /** > > * Magically retrieves the pages for the user addr whilst holding the > > * dev->struct_mutex. > > * > > * Since we can not take the mm semaphore whilst holding our dev->struct_mutex, > > * due to the pre-existing lock dependency established by i915_gem_fault(), > > * we have to perform some sleight-of-hand. > > * > > * First, we try the lockless variant of gup whilst continuing to hold the I do not know what »gup« means. > > * mutex. If that fails to get all the user pages, then we no choice but s/then we no/then we have no/ > > * to acquire the mm semaphore (thus dropping the lock on dev->struct_mutex > > * to do so). The dev->struct_mutex is then re-acquired before we return. > > * > > * Returns: an error code *and* the number of user pages acquired. Even > > * on an error, you must iterate over the return pages and release them. > > */ > > > > ? > > -Chris > > I like this patch... > > Reviewed-by: Ben Widawsky <ben@bwidawsk.net> Reviewed-by: Paul Menzel <paulepanter@users.sourceforge.net> Thanks, Paul
On Fri, 15 Apr 2011 11:48:33 +0200, Paul Menzel <paulepanter@users.sourceforge.net> wrote: > Am Donnerstag, den 14.04.2011, 16:23 -0700 schrieb Ben Widawsky: > > On Wed, Apr 13, 2011 at 08:56:26PM +0100, Chris Wilson wrote: > > > * First, we try the lockless variant of gup whilst continuing to hold the > > I do not know what »gup« means. I guessed that people would recognise the common abbreviation for get_user_pages, but not many people know about the different variants. (I didn't until I dug through the headers trying to find a way to avoid the mm semaphore.) So being explicit here helps, thanks. -Chris
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 33830c9..0028f3b 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -257,6 +257,56 @@ static int i915_gem_object_needs_bit17_swizzle(struct drm_i915_gem_object *obj) obj->tiling_mode != I915_TILING_NONE; } +static int +i915_gem_get_user_pages(struct drm_device *dev, + unsigned long addr, + bool write, + int *num_pages, + struct page ***pages_out) +{ + struct page **pages; + int pinned, ret; + int n = *num_pages; + + pages = kmalloc(n*sizeof(struct page *), + GFP_KERNEL | __GFP_NORETRY | __GFP_NOWARN); + if (pages == NULL) { + pages = drm_malloc_ab(n, sizeof(struct page *)); + if (pages == NULL) { + *pages_out = NULL; + *num_pages = 0; + return -ENOMEM; + } + } + + pinned = __get_user_pages_fast(addr, n, write, pages); + if (pinned < n) { + struct mm_struct *mm = current->mm; + + mutex_unlock(&dev->struct_mutex); + down_read(&mm->mmap_sem); + ret = get_user_pages(current, mm, + addr + (pinned << PAGE_SHIFT), + n - pinned, + write, 0, + pages + pinned, + NULL); + up_read(&mm->mmap_sem); + mutex_lock(&dev->struct_mutex); + if (ret > 0) + pinned += ret; + } + + ret = 0; + if (pinned < n) + ret = -EFAULT; + + *num_pages = pinned; + *pages_out = pages; + return ret; +} + + static inline void slow_shmem_copy(struct page *dst_page, int dst_offset, @@ -398,11 +448,11 @@ i915_gem_shmem_pread_slow(struct drm_device *dev, struct drm_file *file) { struct address_space *mapping = obj->base.filp->f_path.dentry->d_inode->i_mapping; - struct mm_struct *mm = current->mm; struct page **user_pages; ssize_t remain; - loff_t offset, pinned_pages, i; - loff_t first_data_page, last_data_page, num_pages; + loff_t offset; + loff_t first_data_page, last_data_page; + int num_pages, i; int shmem_page_offset; int data_page_index, data_page_offset; int page_length; @@ -420,20 +470,10 @@ i915_gem_shmem_pread_slow(struct drm_device *dev, last_data_page = (data_ptr + args->size - 1) / PAGE_SIZE; num_pages = last_data_page - first_data_page + 1; - user_pages = drm_malloc_ab(num_pages, sizeof(struct page *)); - if (user_pages == NULL) - return -ENOMEM; - - mutex_unlock(&dev->struct_mutex); - down_read(&mm->mmap_sem); - pinned_pages = get_user_pages(current, mm, (uintptr_t)args->data_ptr, - num_pages, 1, 0, user_pages, NULL); - up_read(&mm->mmap_sem); - mutex_lock(&dev->struct_mutex); - if (pinned_pages < num_pages) { - ret = -EFAULT; + ret = i915_gem_get_user_pages(dev, data_ptr, true, + &num_pages, &user_pages); + if (ret) goto out; - } ret = i915_gem_object_set_cpu_read_domain_range(obj, args->offset, @@ -494,7 +534,7 @@ i915_gem_shmem_pread_slow(struct drm_device *dev, } out: - for (i = 0; i < pinned_pages; i++) { + for (i = 0; i < num_pages; i++) { SetPageDirty(user_pages[i]); mark_page_accessed(user_pages[i]); page_cache_release(user_pages[i]); @@ -679,10 +719,9 @@ i915_gem_gtt_pwrite_slow(struct drm_device *dev, drm_i915_private_t *dev_priv = dev->dev_private; ssize_t remain; loff_t gtt_page_base, offset; - loff_t first_data_page, last_data_page, num_pages; - loff_t pinned_pages, i; + loff_t first_data_page, last_data_page; + int num_pages, i; struct page **user_pages; - struct mm_struct *mm = current->mm; int gtt_page_offset, data_page_offset, data_page_index, page_length; int ret; uint64_t data_ptr = args->data_ptr; @@ -697,28 +736,18 @@ i915_gem_gtt_pwrite_slow(struct drm_device *dev, last_data_page = (data_ptr + args->size - 1) / PAGE_SIZE; num_pages = last_data_page - first_data_page + 1; - user_pages = drm_malloc_ab(num_pages, sizeof(struct page *)); - if (user_pages == NULL) - return -ENOMEM; - - mutex_unlock(&dev->struct_mutex); - down_read(&mm->mmap_sem); - pinned_pages = get_user_pages(current, mm, (uintptr_t)args->data_ptr, - num_pages, 0, 0, user_pages, NULL); - up_read(&mm->mmap_sem); - mutex_lock(&dev->struct_mutex); - if (pinned_pages < num_pages) { - ret = -EFAULT; - goto out_unpin_pages; - } + ret = i915_gem_get_user_pages(dev, data_ptr, false, + &num_pages, &user_pages); + if (ret) + goto out; ret = i915_gem_object_set_to_gtt_domain(obj, true); if (ret) - goto out_unpin_pages; + goto out; ret = i915_gem_object_put_fence(obj); if (ret) - goto out_unpin_pages; + goto out; offset = obj->gtt_offset + args->offset; @@ -753,8 +782,8 @@ i915_gem_gtt_pwrite_slow(struct drm_device *dev, data_ptr += page_length; } -out_unpin_pages: - for (i = 0; i < pinned_pages; i++) +out: + for (i = 0; i < num_pages; i++) page_cache_release(user_pages[i]); drm_free_large(user_pages); @@ -803,11 +832,11 @@ i915_gem_shmem_pwrite_fast(struct drm_device *dev, if (IS_ERR(page)) return PTR_ERR(page); - vaddr = kmap_atomic(page, KM_USER0); + vaddr = kmap_atomic(page); ret = __copy_from_user_inatomic(vaddr + page_offset, user_data, page_length); - kunmap_atomic(vaddr, KM_USER0); + kunmap_atomic(vaddr); set_page_dirty(page); mark_page_accessed(page); @@ -842,11 +871,10 @@ i915_gem_shmem_pwrite_slow(struct drm_device *dev, struct drm_file *file) { struct address_space *mapping = obj->base.filp->f_path.dentry->d_inode->i_mapping; - struct mm_struct *mm = current->mm; struct page **user_pages; ssize_t remain; - loff_t offset, pinned_pages, i; - loff_t first_data_page, last_data_page, num_pages; + loff_t first_data_page, last_data_page, offset; + int num_pages, i; int shmem_page_offset; int data_page_index, data_page_offset; int page_length; @@ -864,20 +892,10 @@ i915_gem_shmem_pwrite_slow(struct drm_device *dev, last_data_page = (data_ptr + args->size - 1) / PAGE_SIZE; num_pages = last_data_page - first_data_page + 1; - user_pages = drm_malloc_ab(num_pages, sizeof(struct page *)); - if (user_pages == NULL) - return -ENOMEM; - - mutex_unlock(&dev->struct_mutex); - down_read(&mm->mmap_sem); - pinned_pages = get_user_pages(current, mm, (uintptr_t)args->data_ptr, - num_pages, 0, 0, user_pages, NULL); - up_read(&mm->mmap_sem); - mutex_lock(&dev->struct_mutex); - if (pinned_pages < num_pages) { - ret = -EFAULT; + ret = i915_gem_get_user_pages(dev, data_ptr, false, + &num_pages, &user_pages); + if (ret) goto out; - } ret = i915_gem_object_set_to_cpu_domain(obj, 1); if (ret) @@ -940,7 +958,7 @@ i915_gem_shmem_pwrite_slow(struct drm_device *dev, } out: - for (i = 0; i < pinned_pages; i++) + for (i = 0; i < num_pages; i++) page_cache_release(user_pages[i]); drm_free_large(user_pages);
Replace the three nearly identical copies of the code with a single function. And take advantage of the opportunity to do some micro-optimisation: avoid the vmalloc if at all possible and also avoid dropping the lock unless we are forced to acquire the mm semaphore. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> --- drivers/gpu/drm/i915/i915_gem.c | 132 ++++++++++++++++++++++----------------- 1 files changed, 75 insertions(+), 57 deletions(-)