Message ID | 1310200731-18086-2-git-send-email-chris@chris-wilson.co.uk (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Sat, 9 Jul 2011 09:38:51 +0100, Chris Wilson <chris@chris-wilson.co.uk> wrote: > These paths hold onto the struct mutex whilst accessing pages. In > order, to prevent a recursive dead-lock should we fault-in a GTT mapped > page we need to return -EFAULT and fallback to the slow path. > > Lockdep has complained before about the potential dead-lock, but rvis is > the first application found to sufficiently abuse the API to trigger it. > > Cursory performance regression testing on a 1GiB PineView system using > x11perf, cairo-perf-trace, glxgears and a few game benchmarks suggested > no large regressions with just a 2% slowdown for firefox. The caveat is > that this was an otherwise idle system and that for 32-bit systems > io_mapping_map_atomic_wc() already disabled page-faults. > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=38115 > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > --- > drivers/gpu/drm/i915/i915_gem.c | 22 ++++++++++++++++++++-- > 1 files changed, 20 insertions(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c > index 2fce620..ecb27fd 100644 > --- a/drivers/gpu/drm/i915/i915_gem.c > +++ b/drivers/gpu/drm/i915/i915_gem.c > - vaddr = kmap_atomic(page, KM_USER0); > + vaddr = kmap_atomic(page); > + /* We have to disable faulting here in case the user address > + * is really a GTT mapping and so we can not enter > + * i915_gem_fault() whilst already holding struct_mutex. > + */ > + pagefault_disable(); > ret = __copy_from_user_inatomic(vaddr + page_offset, > user_data, > page_length); > - kunmap_atomic(vaddr, KM_USER0); > + pagefault_enable(); > + kunmap_atomic(vaddr); does this even compile? Looks like you dropped an arg. Looks like a reasonable fix for the problem, though.
On Sat, 09 Jul 2011 07:41:57 -0700, Eric Anholt <eric@anholt.net> wrote: > On Sat, 9 Jul 2011 09:38:51 +0100, Chris Wilson <chris@chris-wilson.co.uk> wrote: > > --- a/drivers/gpu/drm/i915/i915_gem.c > > +++ b/drivers/gpu/drm/i915/i915_gem.c > > > - vaddr = kmap_atomic(page, KM_USER0); > > + vaddr = kmap_atomic(page); > > + /* We have to disable faulting here in case the user address > > + * is really a GTT mapping and so we can not enter > > + * i915_gem_fault() whilst already holding struct_mutex. > > + */ > > + pagefault_disable(); > > ret = __copy_from_user_inatomic(vaddr + page_offset, > > user_data, > > page_length); > > - kunmap_atomic(vaddr, KM_USER0); > > + pagefault_enable(); > > + kunmap_atomic(vaddr); > > does this even compile? Looks like you dropped an arg. That parameter was removed several months ago and although a pass was made through the kernel to update all callsites, this one inexplicably remained. commit t 3e4d3af501cccdc8a8cca41bdbe57d54ad7e7e73 Author: Peter Zijlstra <a.p.zijlstra@chello.nl> Date: Tue Oct 26 14:21:51 2010 -0700 mm: stack based kmap_atomic() -Chris
On Sat, 9 Jul 2011 09:38:51 +0100, Chris Wilson <chris@chris-wilson.co.uk> wrote: > + /* We have to disable faulting here in case the user address > + * is really a GTT mapping and so we can not enter > + * i915_gem_fault() whilst already holding struct_mutex. > + */ I would (far, far) rather disallow pread through the GTT mapping. There's no credible reason to allow it. Is there some reasonably fast way to detect that these addresses are within the GTT and just bail? Any performance penalty that serves solely to enable abuse of the interface is not reasonable.
On Sat, 09 Jul 2011 13:24:02 -0700, Keith Packard <keithp@keithp.com> wrote: > On Sat, 9 Jul 2011 09:38:51 +0100, Chris Wilson <chris@chris-wilson.co.uk> wrote: > > > + /* We have to disable faulting here in case the user address > > + * is really a GTT mapping and so we can not enter > > + * i915_gem_fault() whilst already holding struct_mutex. > > + */ > > I would (far, far) rather disallow pread through the GTT > mapping. There's no credible reason to allow it. Is there some > reasonably fast way to detect that these addresses are within the GTT > and just bail? Something like: vma = find_vma(current->mm, uaddr); if (vma->vm_ops == dev->driver->gem_vm_ops) return -EINVAL; I think would do, find_vma() is not necessary cheap though, and there are a couple of optimisations that we haven't done for pwrite/pread yet to speed up the transition to the slow path. > Any performance penalty that serves solely to enable abuse of the > interface is not reasonable. The current code generates lockdep OOPSes and inconsistently applies pagefault_disable along some paths, in particular for 32-bit kernels, but not others. And the abuse is permitted through the OpenGL specification, I believe. The offending app is just doing glBufferData(glMapBuffer()), iiuc; -Chris
On Sat, 09 Jul 2011 21:50:26 +0100, Chris Wilson <chris@chris-wilson.co.uk> wrote: > > I think would do, find_vma() is not necessary cheap though, and there are a > couple of optimisations that we haven't done for pwrite/pread yet to speed > up the transition to the slow path. Yeah, find_vma is a rb tree walk over the whole address space. Yikes! And, of course, we'd actually need to walk over the whole mapping in case the application managed to walk from non-GTT space into GTT space. > The current code generates lockdep OOPSes and inconsistently applies > pagefault_disable along some paths, in particular for 32-bit kernels, > but not others. And the abuse is permitted through the OpenGL > specification, I believe. The offending app is just doing > glBufferData(glMapBuffer()), iiuc; Sure, it's permitted, so ideally we'd detect this abuse and fall back to the slow path, but we need a cheap check which takes the slow path, perhaps pessimistically.
On Sat, 09 Jul 2011 14:06:23 -0700, Keith Packard <keithp@keithp.com> wrote: > On Sat, 09 Jul 2011 21:50:26 +0100, Chris Wilson <chris@chris-wilson.co.uk> wrote: > Sure, it's permitted, so ideally we'd detect this abuse and fall back to > the slow path, but we need a cheap check which takes the slow path, > perhaps pessimistically. If we prefault every page, then we only hit the slow path under system load combined or severe memory pressure. I don't think that is too bad a compromise. I can stick some counters in there and find out what the impact actually is. -Chris
On Sat, 09 Jul 2011 22:23:28 +0100, Chris Wilson <chris@chris-wilson.co.uk> wrote: > On Sat, 09 Jul 2011 14:06:23 -0700, Keith Packard <keithp@keithp.com> wrote: > > On Sat, 09 Jul 2011 21:50:26 +0100, Chris Wilson <chris@chris-wilson.co.uk> wrote: > > Sure, it's permitted, so ideally we'd detect this abuse and fall back to > > the slow path, but we need a cheap check which takes the slow path, > > perhaps pessimistically. > > If we prefault every page, then we only hit the slow path under system > load combined or severe memory pressure. I don't think that is too bad a > compromise. > > I can stick some counters in there and find out what the impact actually > is. Cool. I've separately posted a query to lkml asking if the existing fault_in_pages_writeable/fault_in_pages_readable should be fixed - that looks like it will provide a performance boost for filesystem reads larger than two pages.
On Sat, 09 Jul 2011 13:24:02 -0700, Keith Packard <keithp@keithp.com> wrote: > On Sat, 9 Jul 2011 09:38:51 +0100, Chris Wilson <chris@chris-wilson.co.uk> wrote: > > > + /* We have to disable faulting here in case the user address > > + * is really a GTT mapping and so we can not enter > > + * i915_gem_fault() whilst already holding struct_mutex. > > + */ > > I would (far, far) rather disallow pread through the GTT > mapping. There's no credible reason to allow it. Is there some > reasonably fast way to detect that these addresses are within the GTT > and just bail? That means that I can't give users of GL pointers to GTT mappings for the buffer mapping API, because then I'd have to check in userland whether the pointer they give me for other API entrypoints is to a known GTT mapping before doing a pread into it. And then I imagine cross-API interactions and shudder.
On Sun, 10 Jul 2011 11:45:29 -0700, Eric Anholt <eric@anholt.net> wrote: > That means that I can't give users of GL pointers to GTT mappings for > the buffer mapping API, because then I'd have to check in userland > whether the pointer they give me for other API entrypoints is to a known > GTT mapping before doing a pread into it. And then I imagine cross-API > interactions and shudder. Yeah, it clearly has to work; it's not clear how fast it needs to be. However, if we get the prefaulting stuff fixed, we should still be able to hit the fast path most of the time.
On Sat, 09 Jul 2011 15:07:28 -0700, Keith Packard <keithp@keithp.com> wrote: Non-text part: multipart/signed > On Sat, 09 Jul 2011 22:23:28 +0100, Chris Wilson <chris@chris-wilson.co.uk> wrote: > > I can stick some counters in there and find out what the impact actually > > is. On a SNB laptop, so surplus memory and surplus processing power, we don't see that many faults at all whilst messing around with firefox, gitk and a couple of timedemos (padman, openarena): no-prefaulting: pread: shmem fast: 1273764/3395, slow: 0/0 pwrite: sheme fast: 51163148/14554, slow: 23552/9 pwrite: gtt fast: 32744702068/12658489, slow: 29376/10 all-prefaulted: pread: shmem fast: 6081544/5999, slow: 0/0 pwrite: shmem fast: 56867580/15932, slow: 0/0 pwrite: gtt fast: 32976168476/12753355, slow: 0/0 I'll look at a PNV netbook next. -Chris
On Mon, 11 Jul 2011 17:51:08 +0100, Chris Wilson <chris@chris-wilson.co.uk> wrote: > no-prefaulting: > pread: shmem fast: 1273764/3395, slow: 0/0 > pwrite: sheme fast: 51163148/14554, slow: 23552/9 > pwrite: gtt fast: 32744702068/12658489, slow: 29376/10 Looks like it won't matter that much, at least when you've got plenty of memory.
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 2fce620..ecb27fd 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -365,9 +365,15 @@ i915_gem_shmem_pread_fast(struct drm_device *dev, return PTR_ERR(page); vaddr = kmap_atomic(page); + /* We have to disable faulting here in case the user address + * is really a GTT mapping and so we can not enter + * i915_gem_fault() whilst already holding struct_mutex. + */ + pagefault_disable(); ret = __copy_to_user_inatomic(user_data, vaddr + page_offset, page_length); + pagefault_enable(); kunmap_atomic(vaddr); mark_page_accessed(page); @@ -593,8 +599,14 @@ fast_user_write(struct io_mapping *mapping, unsigned long unwritten; vaddr_atomic = io_mapping_map_atomic_wc(mapping, page_base); + /* We have to disable faulting here in case the user address + * is really a GTT mapping and so we can not enter + * i915_gem_fault() whilst already holding struct_mutex. + */ + pagefault_disable(); unwritten = __copy_from_user_inatomic_nocache(vaddr_atomic + page_offset, user_data, length); + pagefault_enable(); io_mapping_unmap_atomic(vaddr_atomic); return unwritten; } @@ -812,11 +824,17 @@ i915_gem_shmem_pwrite_fast(struct drm_device *dev, if (IS_ERR(page)) return PTR_ERR(page); - vaddr = kmap_atomic(page, KM_USER0); + vaddr = kmap_atomic(page); + /* We have to disable faulting here in case the user address + * is really a GTT mapping and so we can not enter + * i915_gem_fault() whilst already holding struct_mutex. + */ + pagefault_disable(); ret = __copy_from_user_inatomic(vaddr + page_offset, user_data, page_length); - kunmap_atomic(vaddr, KM_USER0); + pagefault_enable(); + kunmap_atomic(vaddr); set_page_dirty(page); mark_page_accessed(page);
These paths hold onto the struct mutex whilst accessing pages. In order, to prevent a recursive dead-lock should we fault-in a GTT mapped page we need to return -EFAULT and fallback to the slow path. Lockdep has complained before about the potential dead-lock, but rvis is the first application found to sufficiently abuse the API to trigger it. Cursory performance regression testing on a 1GiB PineView system using x11perf, cairo-perf-trace, glxgears and a few game benchmarks suggested no large regressions with just a 2% slowdown for firefox. The caveat is that this was an otherwise idle system and that for 32-bit systems io_mapping_map_atomic_wc() already disabled page-faults. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=38115 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> --- drivers/gpu/drm/i915/i915_gem.c | 22 ++++++++++++++++++++-- 1 files changed, 20 insertions(+), 2 deletions(-)