Message ID | 20190904115644.7620-1-tzimmermann@suse.de (mailing list archive) |
---|---|
Headers | show |
Series | ast, mgag200: Map console BO while it's being displayed | expand |
On Wed, Sep 4, 2019 at 1:56 PM Thomas Zimmermann <tzimmermann@suse.de> wrote: > > (was: drm/vram-helper: Fix performance regression in fbdev) > > Generic fbdev emulation maps and unmaps the console BO for updating it's > content from the shadow buffer. If this involves an actual mapping > operation (instead of reusing an existing mapping), lots of debug messages > may be printed, such as > > x86/PAT: Overlap at 0xd0000000-0xd1000000 > x86/PAT: reserve_memtype added [mem 0xd0000000-0xd02fffff], track write-combining, req write-combining, ret write-combining > x86/PAT: free_memtype request [mem 0xd0000000-0xd02fffff] > > as reported at [1]. Drivers using VRAM helpers may also see reduced > performance as the mapping operations can create overhead. > > This patch set fixes the problem by adding a ref counter to the GEM > VRAM buffers' kmap operation, and keeping the fbdev's console buffer > mapped while the console is being displayed. These changes avoid the > frequent mappings in the fbdev code. The drivers, ast and mgag200, > map the console's buffer when it becomes visible and the fbdev code > reuses this mapping. The original fbdev code in ast and mgag200 used > the same strategy. > > [1] https://lists.freedesktop.org/archives/dri-devel/2019-September/234308.html As discussed on irc a bit, here's my thoughts: - imo we should fix this by using the io_mapping stuff, that avoids the overhead of repeated pat checks for map/unmap. It should also cut down on remapping costs a lot in general (at least on 64bit kernels, which is like everything nowadays). But it's a lot more work to roll out I guess. I think this would be the much better longterm fix. - this here only works when fbcon is active, the noise will come back when you start X or wayland. We should probably check whether the display is active with drm_master_internal_acquire (and reupload when we restore the entire console in the restore function - could just launch the worker for that). I'm also not sure whether we have a real problem here, it's just debug noise that we're fighting here? -Daniel > > v2: > * fixed comment typos > > Thomas Zimmermann (3): > drm/vram: Add kmap ref-counting to GEM VRAM objects > drm/ast: Map fbdev framebuffer while it's being displayed > drm/mgag200: Map fbdev framebuffer while it's being displayed > > drivers/gpu/drm/ast/ast_mode.c | 19 +++++++ > drivers/gpu/drm/drm_gem_vram_helper.c | 74 +++++++++++++++++++------- > drivers/gpu/drm/mgag200/mgag200_mode.c | 20 +++++++ > include/drm/drm_gem_vram_helper.h | 19 +++++++ > 4 files changed, 114 insertions(+), 18 deletions(-) > > -- > 2.23.0 >
On Wed, 04 Sep 2019, Daniel Vetter wrote: >I'm also not sure whether we have a real problem here, it's just debug >noise that we're fighting here? It is non stop debug noise as the memory range in question is being added + deleted over and over. I doubt we want to be burning cycles like this. Thanks, Davidlohr
On Wed, Sep 4, 2019 at 7:14 PM Davidlohr Bueso <dave@stgolabs.net> wrote: > On Wed, 04 Sep 2019, Daniel Vetter wrote: > >I'm also not sure whether we have a real problem here, it's just debug > >noise that we're fighting here? > > It is non stop debug noise as the memory range in question is being added + > deleted over and over. I doubt we want to be burning cycles like this. Yeah the proper fix is setting up an io_mapping in ttm (or drivers) so the pat tracking is cached, and then using the right pte wrangling functions. But that's a lot more involved fix, and from all the testing we've done the pte rewriting itself doesn't seem to be the biggest issue with mgag200 being slow ... -Daniel
Hi, > - imo we should fix this by using the io_mapping stuff, that avoids > the overhead of repeated pat checks for map/unmap. Another idea: IIRC ttm has a move_notify callback. So we could simply keep mappings active even when the refcount goes down to zero. Then do the actual unmap either in the move_notify or in the destroy callback. cheers, Gerd
On Thu, Sep 5, 2019 at 9:01 AM Gerd Hoffmann <kraxel@redhat.com> wrote: > > Hi, > > > - imo we should fix this by using the io_mapping stuff, that avoids > > the overhead of repeated pat checks for map/unmap. > > Another idea: IIRC ttm has a move_notify callback. So we could simply > keep mappings active even when the refcount goes down to zero. Then do > the actual unmap either in the move_notify or in the destroy callback. Yeah that should be a really clean solution, and only needs changes in the vram helpers. Which is nice, more common code! -Daniel
Hi Am 05.09.19 um 09:56 schrieb Daniel Vetter: > On Thu, Sep 5, 2019 at 9:01 AM Gerd Hoffmann <kraxel@redhat.com> wrote: >> >> Hi, >> >>> - imo we should fix this by using the io_mapping stuff, that avoids >>> the overhead of repeated pat checks for map/unmap. >> >> Another idea: IIRC ttm has a move_notify callback. So we could simply >> keep mappings active even when the refcount goes down to zero. Then do >> the actual unmap either in the move_notify or in the destroy callback. > > Yeah that should be a really clean solution, and only needs changes in > the vram helpers. Which is nice, more common code! But the console's BO is special wrt to mapping. Putting special code for console handling into vram helpers doesn't seem right. I think it's preferable to keep the special cases located in fbdev emulation. Or even better in DRM client code, so that other, future, internal clients automatically do the right thing. Best regards Thomas > -Daniel >
On Thu, Sep 05, 2019 at 10:19:40AM +0200, Thomas Zimmermann wrote: > Hi > > Am 05.09.19 um 09:56 schrieb Daniel Vetter: > > On Thu, Sep 5, 2019 at 9:01 AM Gerd Hoffmann <kraxel@redhat.com> wrote: > >> > >> Hi, > >> > >>> - imo we should fix this by using the io_mapping stuff, that avoids > >>> the overhead of repeated pat checks for map/unmap. > >> > >> Another idea: IIRC ttm has a move_notify callback. So we could simply > >> keep mappings active even when the refcount goes down to zero. Then do > >> the actual unmap either in the move_notify or in the destroy callback. > > > > Yeah that should be a really clean solution, and only needs changes in > > the vram helpers. Which is nice, more common code! > > But the console's BO is special wrt to mapping. Putting special code for > console handling into vram helpers doesn't seem right. I have no special handling in mind. I think we can simply do that for all gem objects, no matter whenever they are created by fbcon or userspace (wayland/xorg/whatever). vmap will create a mapping (or increase the reference count). vunmap decreases the reference count, when it goes down to zero unpin it but keep the mapping. The actual unmap happens when ttm wants move the object from VRAM to SYSTEM due to VRAM being full. In case vram has room for all our objects we simply never unmap. hope this clarifies, Gerd
Hi Am 05.09.19 um 11:29 schrieb Gerd Hoffmann: > On Thu, Sep 05, 2019 at 10:19:40AM +0200, Thomas Zimmermann wrote: >> Hi >> >> Am 05.09.19 um 09:56 schrieb Daniel Vetter: >>> On Thu, Sep 5, 2019 at 9:01 AM Gerd Hoffmann <kraxel@redhat.com> wrote: >>>> >>>> Hi, >>>> >>>>> - imo we should fix this by using the io_mapping stuff, that avoids >>>>> the overhead of repeated pat checks for map/unmap. >>>> >>>> Another idea: IIRC ttm has a move_notify callback. So we could simply >>>> keep mappings active even when the refcount goes down to zero. Then do >>>> the actual unmap either in the move_notify or in the destroy callback. >>> >>> Yeah that should be a really clean solution, and only needs changes in >>> the vram helpers. Which is nice, more common code! >> >> But the console's BO is special wrt to mapping. Putting special code for >> console handling into vram helpers doesn't seem right. > > I have no special handling in mind. I think we can simply do that for > all gem objects, no matter whenever they are created by fbcon or > userspace (wayland/xorg/whatever). vmap will create a mapping (or > increase the reference count). vunmap decreases the reference count, > when it goes down to zero unpin it but keep the mapping. The actual > unmap happens when ttm wants move the object from VRAM to SYSTEM due to > VRAM being full. In case vram has room for all our objects we simply > never unmap. That's pretty cool. Thanks for clarifying. I think it's the solution I was looking for. Best regards Thomas > > hope this clarifies, > Gerd > > _______________________________________________ > dri-devel mailing list > dri-devel@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/dri-devel >