Message ID | 1374213086-2337-3-git-send-email-xiong.y.zhang@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Fri, Jul 19, 2013 at 01:51:26PM +0800, Xiong Zhang wrote: > fault_in_multipages_writeable() will load fault page into physical > memory, then pread can run fast path, no need to run slow path > > Signed-off-by: Xiong Zhang <xiong.y.zhang@intel.com> Hm, this avoids going through the slowpath for the first page. Does this have an impact on micro-benchmarks (we have a few) or is this just something you've spotted? I'm a bit vary of making the already complicated shmem pread/pwrite paths even more complicated. Chris? -Daniel > --- > drivers/gpu/drm/i915/i915_gem.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c > index f194d7e..982df1e 100644 > --- a/drivers/gpu/drm/i915/i915_gem.c > +++ b/drivers/gpu/drm/i915/i915_gem.c > @@ -459,6 +459,7 @@ i915_gem_shmem_pread(struct drm_device *dev, > page_do_bit17_swizzling = obj_do_bit17_swizzling && > (page_to_phys(page) & (1 << 17)) != 0; > > +read_again: > ret = shmem_pread_fast(page, shmem_page_offset, page_length, > user_data, page_do_bit17_swizzling, > needs_clflush); > @@ -475,6 +476,8 @@ i915_gem_shmem_pread(struct drm_device *dev, > * and just continue. */ > (void)ret; > prefaulted = 1; > + mutex_lock(&dev->struct_mutex); > + goto read_again; > } > > ret = shmem_pread_slow(page, shmem_page_offset, page_length, > -- > 1.8.3.2 > > _______________________________________________ > Intel-gfx mailing list > Intel-gfx@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/intel-gfx
It just influence one page. During my test, I found one read slow path info when enable prefault. So I add this patch. Why don't move fault_in_multipages_writeable() from i915_gem_shmem_pread() to the beginning of i915_gem_pread_ioctl() ? So the pread_ioctl() is like pwrite_ioctl(). Is the cost of fault_in_multipages_writeable() lager than fault_in_multipages_readable() ? See the attachment. thanks -----Original Message----- From: Daniel Vetter [mailto:daniel.vetter@ffwll.ch] On Behalf Of Daniel Vetter Sent: Friday, July 19, 2013 3:29 PM To: Zhang, Xiong Y Cc: intel-gfx@lists.freedesktop.org; Chris Wilson Subject: Re: [Intel-gfx] [PATCH 3/3] drm/i915: run shmem_pread_fast() after page fault On Fri, Jul 19, 2013 at 01:51:26PM +0800, Xiong Zhang wrote: > fault_in_multipages_writeable() will load fault page into physical > memory, then pread can run fast path, no need to run slow path > > Signed-off-by: Xiong Zhang <xiong.y.zhang@intel.com> Hm, this avoids going through the slowpath for the first page. Does this have an impact on micro-benchmarks (we have a few) or is this just something you've spotted? I'm a bit vary of making the already complicated shmem pread/pwrite paths even more complicated. Chris? -Daniel > --- > drivers/gpu/drm/i915/i915_gem.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/drivers/gpu/drm/i915/i915_gem.c > b/drivers/gpu/drm/i915/i915_gem.c index f194d7e..982df1e 100644 > --- a/drivers/gpu/drm/i915/i915_gem.c > +++ b/drivers/gpu/drm/i915/i915_gem.c > @@ -459,6 +459,7 @@ i915_gem_shmem_pread(struct drm_device *dev, > page_do_bit17_swizzling = obj_do_bit17_swizzling && > (page_to_phys(page) & (1 << 17)) != 0; > > +read_again: > ret = shmem_pread_fast(page, shmem_page_offset, page_length, > user_data, page_do_bit17_swizzling, > needs_clflush); > @@ -475,6 +476,8 @@ i915_gem_shmem_pread(struct drm_device *dev, > * and just continue. */ > (void)ret; > prefaulted = 1; > + mutex_lock(&dev->struct_mutex); > + goto read_again; > } > > ret = shmem_pread_slow(page, shmem_page_offset, page_length, > -- > 1.8.3.2 > > _______________________________________________ > Intel-gfx mailing list > Intel-gfx@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/intel-gfx -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch
On Fri, Jul 19, 2013 at 08:52:39AM +0000, Zhang, Xiong Y wrote: > It just influence one page. > During my test, I found one read slow path info when enable prefault. So > I add this patch. Why don't move fault_in_multipages_writeable() from > i915_gem_shmem_pread() to the beginning of i915_gem_pread_ioctl() ? So > the pread_ioctl() is like pwrite_ioctl(). Is the cost of > fault_in_multipages_writeable() lager than > fault_in_multipages_readable() ? See the attachment. It's a slight matter of correctness fixed in: commit 96d79b52701758404cf8701986891afc99ce810b Author: Daniel Vetter <daniel.vetter@ffwll.ch> Date: Sun Mar 25 19:47:36 2012 +0200 drm/i915: don't clobber userspace memory before commiting to the pread The issue is that we can't prefault before we know that we'll do the write and prefaulting earlier will write garbage into userspace's memory. Hence the tricky ordering. I guess we should add a comment why this is so to the code. -Daniel > > thanks > -----Original Message----- > From: Daniel Vetter [mailto:daniel.vetter@ffwll.ch] On Behalf Of Daniel Vetter > Sent: Friday, July 19, 2013 3:29 PM > To: Zhang, Xiong Y > Cc: intel-gfx@lists.freedesktop.org; Chris Wilson > Subject: Re: [Intel-gfx] [PATCH 3/3] drm/i915: run shmem_pread_fast() after page fault > > On Fri, Jul 19, 2013 at 01:51:26PM +0800, Xiong Zhang wrote: > > fault_in_multipages_writeable() will load fault page into physical > > memory, then pread can run fast path, no need to run slow path > > > > Signed-off-by: Xiong Zhang <xiong.y.zhang@intel.com> > > Hm, this avoids going through the slowpath for the first page. Does this have an impact on micro-benchmarks (we have a few) or is this just something you've spotted? I'm a bit vary of making the already complicated shmem pread/pwrite paths even more complicated. Chris? > -Daniel > > > --- > > drivers/gpu/drm/i915/i915_gem.c | 3 +++ > > 1 file changed, 3 insertions(+) > > > > diff --git a/drivers/gpu/drm/i915/i915_gem.c > > b/drivers/gpu/drm/i915/i915_gem.c index f194d7e..982df1e 100644 > > --- a/drivers/gpu/drm/i915/i915_gem.c > > +++ b/drivers/gpu/drm/i915/i915_gem.c > > @@ -459,6 +459,7 @@ i915_gem_shmem_pread(struct drm_device *dev, > > page_do_bit17_swizzling = obj_do_bit17_swizzling && > > (page_to_phys(page) & (1 << 17)) != 0; > > > > +read_again: > > ret = shmem_pread_fast(page, shmem_page_offset, page_length, > > user_data, page_do_bit17_swizzling, > > needs_clflush); > > @@ -475,6 +476,8 @@ i915_gem_shmem_pread(struct drm_device *dev, > > * and just continue. */ > > (void)ret; > > prefaulted = 1; > > + mutex_lock(&dev->struct_mutex); > > + goto read_again; > > } > > > > ret = shmem_pread_slow(page, shmem_page_offset, page_length, > > -- > > 1.8.3.2 > > > > _______________________________________________ > > Intel-gfx mailing list > > Intel-gfx@lists.freedesktop.org > > http://lists.freedesktop.org/mailman/listinfo/intel-gfx > > -- > Daniel Vetter > Software Engineer, Intel Corporation > +41 (0) 79 365 57 48 - http://blog.ffwll.ch
On Fri, Jul 19, 2013 at 08:52:39AM +0000, Zhang, Xiong Y wrote: > It just influence one page. > During my test, I found one read slow path info when enable prefault. So I add this patch. > Why don't move fault_in_multipages_writeable() from i915_gem_shmem_pread() to the beginning of i915_gem_pread_ioctl() ? > So the pread_ioctl() is like pwrite_ioctl(). Is the cost of fault_in_multipages_writeable() lager than fault_in_multipages_readable() ? The essence of the problem with fault_in_multipages_writeable() is that it writes a 0 without us guarranteeing to replace it with correct data. Real world usage (at least in the ddx) of pread is limited to UXA readbacks (x11perf -shmget would be the best test case), and more or less gen2 readbacks in SNA. (For SNA we first try to use CPU bo and various maps before resorting to pread.) You will want to extend the microbenchmarks such as i-g-t/gem_gtt_speed to cover cases which can be prefaulted and where prefaulting helps. If you can measure a difference between the paths in the current code, you are ready to see if you can measure an improvement from your patch. -Chris
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index f194d7e..982df1e 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -459,6 +459,7 @@ i915_gem_shmem_pread(struct drm_device *dev, page_do_bit17_swizzling = obj_do_bit17_swizzling && (page_to_phys(page) & (1 << 17)) != 0; +read_again: ret = shmem_pread_fast(page, shmem_page_offset, page_length, user_data, page_do_bit17_swizzling, needs_clflush); @@ -475,6 +476,8 @@ i915_gem_shmem_pread(struct drm_device *dev, * and just continue. */ (void)ret; prefaulted = 1; + mutex_lock(&dev->struct_mutex); + goto read_again; } ret = shmem_pread_slow(page, shmem_page_offset, page_length,
fault_in_multipages_writeable() will load fault page into physical memory, then pread can run fast path, no need to run slow path Signed-off-by: Xiong Zhang <xiong.y.zhang@intel.com> --- drivers/gpu/drm/i915/i915_gem.c | 3 +++ 1 file changed, 3 insertions(+)