mbox series

[v2,0/3] ast, mgag200: Map console BO while it's being displayed

Message ID 20190904115644.7620-1-tzimmermann@suse.de (mailing list archive)
Headers show
Series ast, mgag200: Map console BO while it's being displayed | expand

Message

Thomas Zimmermann Sept. 4, 2019, 11:56 a.m. UTC
(was: drm/vram-helper: Fix performance regression in fbdev)

Generic fbdev emulation maps and unmaps the console BO for updating it's
content from the shadow buffer. If this involves an actual mapping
operation (instead of reusing an existing mapping), lots of debug messages
may be printed, such as

  x86/PAT: Overlap at 0xd0000000-0xd1000000
  x86/PAT: reserve_memtype added [mem 0xd0000000-0xd02fffff], track write-combining, req write-combining, ret write-combining
  x86/PAT: free_memtype request [mem 0xd0000000-0xd02fffff]

as reported at [1]. Drivers using VRAM helpers may also see reduced
performance as the mapping operations can create overhead.

This patch set fixes the problem by adding a ref counter to the GEM
VRAM buffers' kmap operation, and keeping the fbdev's console buffer
mapped while the console is being displayed. These changes avoid the
frequent mappings in the fbdev code. The drivers, ast and mgag200,
map the console's buffer when it becomes visible and the fbdev code
reuses this mapping. The original fbdev code in ast and mgag200 used
the same strategy.

[1] https://lists.freedesktop.org/archives/dri-devel/2019-September/234308.html

v2:
	* fixed comment typos

Thomas Zimmermann (3):
  drm/vram: Add kmap ref-counting to GEM VRAM objects
  drm/ast: Map fbdev framebuffer while it's being displayed
  drm/mgag200: Map fbdev framebuffer while it's being displayed

 drivers/gpu/drm/ast/ast_mode.c         | 19 +++++++
 drivers/gpu/drm/drm_gem_vram_helper.c  | 74 +++++++++++++++++++-------
 drivers/gpu/drm/mgag200/mgag200_mode.c | 20 +++++++
 include/drm/drm_gem_vram_helper.h      | 19 +++++++
 4 files changed, 114 insertions(+), 18 deletions(-)

--
2.23.0

Comments

Daniel Vetter Sept. 4, 2019, 12:13 p.m. UTC | #1
On Wed, Sep 4, 2019 at 1:56 PM Thomas Zimmermann <tzimmermann@suse.de> wrote:
>
> (was: drm/vram-helper: Fix performance regression in fbdev)
>
> Generic fbdev emulation maps and unmaps the console BO for updating it's
> content from the shadow buffer. If this involves an actual mapping
> operation (instead of reusing an existing mapping), lots of debug messages
> may be printed, such as
>
>   x86/PAT: Overlap at 0xd0000000-0xd1000000
>   x86/PAT: reserve_memtype added [mem 0xd0000000-0xd02fffff], track write-combining, req write-combining, ret write-combining
>   x86/PAT: free_memtype request [mem 0xd0000000-0xd02fffff]
>
> as reported at [1]. Drivers using VRAM helpers may also see reduced
> performance as the mapping operations can create overhead.
>
> This patch set fixes the problem by adding a ref counter to the GEM
> VRAM buffers' kmap operation, and keeping the fbdev's console buffer
> mapped while the console is being displayed. These changes avoid the
> frequent mappings in the fbdev code. The drivers, ast and mgag200,
> map the console's buffer when it becomes visible and the fbdev code
> reuses this mapping. The original fbdev code in ast and mgag200 used
> the same strategy.
>
> [1] https://lists.freedesktop.org/archives/dri-devel/2019-September/234308.html

As discussed on irc a bit, here's my thoughts:

- imo we should fix this by using the io_mapping stuff, that avoids
the overhead of repeated pat checks for map/unmap. It should also cut
down on remapping costs a lot in general (at least on 64bit kernels,
which is like everything nowadays). But it's a lot more work to roll
out I guess. I think this would be the much better longterm fix.

- this here only works when fbcon is active, the noise will come back
when you start X or wayland. We should probably check whether the
display is active with drm_master_internal_acquire (and reupload when
we restore the entire console in the restore function - could just
launch the worker for that).

I'm also not sure whether we have a real problem here, it's just debug
noise that we're fighting here?
-Daniel

>
> v2:
>         * fixed comment typos
>
> Thomas Zimmermann (3):
>   drm/vram: Add kmap ref-counting to GEM VRAM objects
>   drm/ast: Map fbdev framebuffer while it's being displayed
>   drm/mgag200: Map fbdev framebuffer while it's being displayed
>
>  drivers/gpu/drm/ast/ast_mode.c         | 19 +++++++
>  drivers/gpu/drm/drm_gem_vram_helper.c  | 74 +++++++++++++++++++-------
>  drivers/gpu/drm/mgag200/mgag200_mode.c | 20 +++++++
>  include/drm/drm_gem_vram_helper.h      | 19 +++++++
>  4 files changed, 114 insertions(+), 18 deletions(-)
>
> --
> 2.23.0
>
Davidlohr Bueso Sept. 4, 2019, 5:14 p.m. UTC | #2
On Wed, 04 Sep 2019, Daniel Vetter wrote:
>I'm also not sure whether we have a real problem here, it's just debug
>noise that we're fighting here?

It is non stop debug noise as the memory range in question is being added +
deleted over and over. I doubt we want to be burning cycles like this.

Thanks,
Davidlohr
Daniel Vetter Sept. 4, 2019, 6:43 p.m. UTC | #3
On Wed, Sep 4, 2019 at 7:14 PM Davidlohr Bueso <dave@stgolabs.net> wrote:
> On Wed, 04 Sep 2019, Daniel Vetter wrote:
> >I'm also not sure whether we have a real problem here, it's just debug
> >noise that we're fighting here?
>
> It is non stop debug noise as the memory range in question is being added +
> deleted over and over. I doubt we want to be burning cycles like this.

Yeah the proper fix is setting up an io_mapping in ttm (or drivers) so
the pat tracking is cached, and then using the right pte wrangling
functions. But that's a lot more involved fix, and from all the
testing we've done the pte rewriting itself doesn't seem to be the
biggest issue with mgag200 being slow ...
-Daniel
Gerd Hoffmann Sept. 5, 2019, 7 a.m. UTC | #4
Hi,

> - imo we should fix this by using the io_mapping stuff, that avoids
> the overhead of repeated pat checks for map/unmap.

Another idea:  IIRC ttm has a move_notify callback.  So we could simply
keep mappings active even when the refcount goes down to zero.  Then do
the actual unmap either in the move_notify or in the destroy callback.

cheers,
  Gerd
Daniel Vetter Sept. 5, 2019, 7:56 a.m. UTC | #5
On Thu, Sep 5, 2019 at 9:01 AM Gerd Hoffmann <kraxel@redhat.com> wrote:
>
>   Hi,
>
> > - imo we should fix this by using the io_mapping stuff, that avoids
> > the overhead of repeated pat checks for map/unmap.
>
> Another idea:  IIRC ttm has a move_notify callback.  So we could simply
> keep mappings active even when the refcount goes down to zero.  Then do
> the actual unmap either in the move_notify or in the destroy callback.

Yeah that should be a really clean solution, and only needs changes in
the vram helpers. Which is nice, more common code!
-Daniel
Thomas Zimmermann Sept. 5, 2019, 8:19 a.m. UTC | #6
Hi

Am 05.09.19 um 09:56 schrieb Daniel Vetter:
> On Thu, Sep 5, 2019 at 9:01 AM Gerd Hoffmann <kraxel@redhat.com> wrote:
>>
>>   Hi,
>>
>>> - imo we should fix this by using the io_mapping stuff, that avoids
>>> the overhead of repeated pat checks for map/unmap.
>>
>> Another idea:  IIRC ttm has a move_notify callback.  So we could simply
>> keep mappings active even when the refcount goes down to zero.  Then do
>> the actual unmap either in the move_notify or in the destroy callback.
> 
> Yeah that should be a really clean solution, and only needs changes in
> the vram helpers. Which is nice, more common code!

But the console's BO is special wrt to mapping. Putting special code for
 console handling into vram helpers doesn't seem right. I think it's
preferable to keep the special cases located in fbdev emulation. Or even
better in DRM client code, so that other, future, internal clients
automatically do the right thing.

Best regards
Thomas

> -Daniel
>
Gerd Hoffmann Sept. 5, 2019, 9:29 a.m. UTC | #7
On Thu, Sep 05, 2019 at 10:19:40AM +0200, Thomas Zimmermann wrote:
> Hi
> 
> Am 05.09.19 um 09:56 schrieb Daniel Vetter:
> > On Thu, Sep 5, 2019 at 9:01 AM Gerd Hoffmann <kraxel@redhat.com> wrote:
> >>
> >>   Hi,
> >>
> >>> - imo we should fix this by using the io_mapping stuff, that avoids
> >>> the overhead of repeated pat checks for map/unmap.
> >>
> >> Another idea:  IIRC ttm has a move_notify callback.  So we could simply
> >> keep mappings active even when the refcount goes down to zero.  Then do
> >> the actual unmap either in the move_notify or in the destroy callback.
> > 
> > Yeah that should be a really clean solution, and only needs changes in
> > the vram helpers. Which is nice, more common code!
> 
> But the console's BO is special wrt to mapping. Putting special code for
>  console handling into vram helpers doesn't seem right.

I have no special handling in mind.  I think we can simply do that for
all gem objects, no matter whenever they are created by fbcon or
userspace (wayland/xorg/whatever).  vmap will create a mapping (or
increase the reference count).  vunmap decreases the reference count,
when it goes down to zero unpin it but keep the mapping.  The actual
unmap happens when ttm wants move the object from VRAM to SYSTEM due to
VRAM being full.  In case vram has room for all our objects we simply
never unmap.

hope this clarifies,
  Gerd
Thomas Zimmermann Sept. 6, 2019, 8:10 a.m. UTC | #8
Hi

Am 05.09.19 um 11:29 schrieb Gerd Hoffmann:
> On Thu, Sep 05, 2019 at 10:19:40AM +0200, Thomas Zimmermann wrote:
>> Hi
>>
>> Am 05.09.19 um 09:56 schrieb Daniel Vetter:
>>> On Thu, Sep 5, 2019 at 9:01 AM Gerd Hoffmann <kraxel@redhat.com> wrote:
>>>>
>>>>   Hi,
>>>>
>>>>> - imo we should fix this by using the io_mapping stuff, that avoids
>>>>> the overhead of repeated pat checks for map/unmap.
>>>>
>>>> Another idea:  IIRC ttm has a move_notify callback.  So we could simply
>>>> keep mappings active even when the refcount goes down to zero.  Then do
>>>> the actual unmap either in the move_notify or in the destroy callback.
>>>
>>> Yeah that should be a really clean solution, and only needs changes in
>>> the vram helpers. Which is nice, more common code!
>>
>> But the console's BO is special wrt to mapping. Putting special code for
>>  console handling into vram helpers doesn't seem right.
> 
> I have no special handling in mind.  I think we can simply do that for
> all gem objects, no matter whenever they are created by fbcon or
> userspace (wayland/xorg/whatever).  vmap will create a mapping (or
> increase the reference count).  vunmap decreases the reference count,
> when it goes down to zero unpin it but keep the mapping.  The actual
> unmap happens when ttm wants move the object from VRAM to SYSTEM due to
> VRAM being full.  In case vram has room for all our objects we simply
> never unmap.

That's pretty cool. Thanks for clarifying. I think it's the solution I
was looking for.

Best regards
Thomas

> 
> hope this clarifies,
>   Gerd
> 
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>