mbox series

[v3,0/2] drm: Add shmem GEM library

Message ID 20180911124400.40588-1-noralf@tronnes.org (mailing list archive)
Headers show
Series drm: Add shmem GEM library | expand

Message

Noralf Trønnes Sept. 11, 2018, 12:43 p.m. UTC
This patchset adds a library for shmem backed GEM objects and makes use
of it in tinydrm.

When I made tinydrm I used the CMA helper because it was very easy to
use. July last year I learned that this limits which drivers to PRIME
import from, since CMA requires continuous memory. tinydrm drivers don't
require that. So I set out to change that looking first at shmem, but
that wasn't working since shmem didn't work with fbdev deferred I/O.
Then I did a vmalloc buffer attempt which worked with deferred I/O, but
maybe wouldn't be of so much use as a library for other drivers to use.
As my work to split out stuff from the CMA helper for shared use came to
an end, I had a generic fbdev emulation that uses a shadow buffer for
deferred I/O.
This means that I can now use shmem buffers after all.

I have looked at the other drivers that use drm_gem_get_pages() and
several supports different cache modes so I've done that even though
tinydrm only uses the cached one.

tinydrm can both use vmalloc and shmem buffers, it doesn't matter as far
as I can see. So the question is what will benefit the rest of DRM the most.

Note:
Sparse has this complaint, but the problem is in kvmalloc_array():
include/linux/mm.h:592:13: error: undefined identifier '__builtin_mul_overflow'

Noralf.

Changes since version 2:
- Grammar (Sam Ravnborg)
- s/drm_gem_shmem_put_pages_unlocked/drm_gem_shmem_put_pages_locked/
  (Sam Ravnborg)
- Add debug ouput in error path (Sam Ravnborg)

Changes since version 1:
- Fix missing argument in docs (kbuild test robot)
- Fix: sparse: expression using sizeof(void) (kbuild test robot)
- Rebasing gave a new checkpatch warning, so I changed to bitfields:
  CHECK: Avoid using bool structure members because of possible alignment
  issues - see: https://lkml.org/lkml/2017/11/21/384
  #834: FILE: include/drm/drm_gem_shmem_helper.h:84:
  +       bool pages_mark_dirty_on_put;
  #841: FILE: include/drm/drm_gem_shmem_helper.h:91:
  +       bool pages_mark_accessed_on_put;

Noralf Trønnes (2):
  drm: Add library for shmem backed GEM objects
  drm/tinydrm: Switch from CMA to shmem buffers

 Documentation/gpu/drm-kms-helpers.rst          |  12 +
 drivers/gpu/drm/Kconfig                        |   6 +
 drivers/gpu/drm/Makefile                       |   1 +
 drivers/gpu/drm/drm_gem_shmem_helper.c         | 678 +++++++++++++++++++++++++
 drivers/gpu/drm/tinydrm/Kconfig                |   2 +-
 drivers/gpu/drm/tinydrm/core/tinydrm-core.c    |  91 +---
 drivers/gpu/drm/tinydrm/core/tinydrm-helpers.c |   5 +
 drivers/gpu/drm/tinydrm/ili9225.c              |  14 +-
 drivers/gpu/drm/tinydrm/ili9341.c              |   6 +-
 drivers/gpu/drm/tinydrm/mi0283qt.c             |   6 +-
 drivers/gpu/drm/tinydrm/mipi-dbi.c             |  38 +-
 drivers/gpu/drm/tinydrm/repaper.c              |  24 +-
 drivers/gpu/drm/tinydrm/st7586.c               |  15 +-
 drivers/gpu/drm/tinydrm/st7735r.c              |   6 +-
 include/drm/drm_gem_shmem_helper.h             | 198 ++++++++
 include/drm/tinydrm/tinydrm.h                  |  36 +-
 16 files changed, 984 insertions(+), 154 deletions(-)
 create mode 100644 drivers/gpu/drm/drm_gem_shmem_helper.c
 create mode 100644 include/drm/drm_gem_shmem_helper.h

Comments

Thomas Hellström (VMware) Sept. 14, 2018, 3:48 p.m. UTC | #1
Hi, Noralf,

On 09/11/2018 02:43 PM, Noralf Trønnes wrote:
> This patchset adds a library for shmem backed GEM objects and makes use
> of it in tinydrm.
>
> When I made tinydrm I used the CMA helper because it was very easy to
> use. July last year I learned that this limits which drivers to PRIME
> import from, since CMA requires continuous memory. tinydrm drivers don't
> require that. So I set out to change that looking first at shmem, but
> that wasn't working since shmem didn't work with fbdev deferred I/O.
> Then I did a vmalloc buffer attempt which worked with deferred I/O, but
> maybe wouldn't be of so much use as a library for other drivers to use.
> As my work to split out stuff from the CMA helper for shared use came to
> an end, I had a generic fbdev emulation that uses a shadow buffer for
> deferred I/O.
> This means that I can now use shmem buffers after all.
>
> I have looked at the other drivers that use drm_gem_get_pages() and
> several supports different cache modes so I've done that even though
> tinydrm only uses the cached one.
Out if interest, how can you use writecombine and uncached with shared 
memory?
as typically the linear kernel map of the affected pages also needs 
changing?

Thanks,
Thomas
Daniel Vetter Sept. 14, 2018, 4:13 p.m. UTC | #2
On Fri, Sep 14, 2018 at 5:48 PM, Thomas Hellstrom <thomas@shipmail.org> wrote:
> Hi, Noralf,
>
> On 09/11/2018 02:43 PM, Noralf Trønnes wrote:
>>
>> This patchset adds a library for shmem backed GEM objects and makes use
>> of it in tinydrm.
>>
>> When I made tinydrm I used the CMA helper because it was very easy to
>> use. July last year I learned that this limits which drivers to PRIME
>> import from, since CMA requires continuous memory. tinydrm drivers don't
>> require that. So I set out to change that looking first at shmem, but
>> that wasn't working since shmem didn't work with fbdev deferred I/O.
>> Then I did a vmalloc buffer attempt which worked with deferred I/O, but
>> maybe wouldn't be of so much use as a library for other drivers to use.
>> As my work to split out stuff from the CMA helper for shared use came to
>> an end, I had a generic fbdev emulation that uses a shadow buffer for
>> deferred I/O.
>> This means that I can now use shmem buffers after all.
>>
>> I have looked at the other drivers that use drm_gem_get_pages() and
>> several supports different cache modes so I've done that even though
>> tinydrm only uses the cached one.
>
> Out if interest, how can you use writecombine and uncached with shared
> memory?
> as typically the linear kernel map of the affected pages also needs
> changing?

I think on x86 at least the core code takes care of that. On arm, I'm
not sure this just works, since you can't change the mode of the
linear map. Instead you need specially allocated memory which is _not_
in the linear map. I guess arm boxes with pcie slots aren't that
common yet :-)
-Daniel

>
> Thanks,
> Thomas
>
>
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
Noralf Trønnes Sept. 14, 2018, 4:51 p.m. UTC | #3
Den 14.09.2018 18.13, skrev Daniel Vetter:
> On Fri, Sep 14, 2018 at 5:48 PM, Thomas Hellstrom <thomas@shipmail.org> wrote:
>> Hi, Noralf,
>>
>> On 09/11/2018 02:43 PM, Noralf Trønnes wrote:
>>> This patchset adds a library for shmem backed GEM objects and makes use
>>> of it in tinydrm.
>>>
>>> When I made tinydrm I used the CMA helper because it was very easy to
>>> use. July last year I learned that this limits which drivers to PRIME
>>> import from, since CMA requires continuous memory. tinydrm drivers don't
>>> require that. So I set out to change that looking first at shmem, but
>>> that wasn't working since shmem didn't work with fbdev deferred I/O.
>>> Then I did a vmalloc buffer attempt which worked with deferred I/O, but
>>> maybe wouldn't be of so much use as a library for other drivers to use.
>>> As my work to split out stuff from the CMA helper for shared use came to
>>> an end, I had a generic fbdev emulation that uses a shadow buffer for
>>> deferred I/O.
>>> This means that I can now use shmem buffers after all.
>>>
>>> I have looked at the other drivers that use drm_gem_get_pages() and
>>> several supports different cache modes so I've done that even though
>>> tinydrm only uses the cached one.
>> Out if interest, how can you use writecombine and uncached with shared
>> memory?
>> as typically the linear kernel map of the affected pages also needs
>> changing?
> I think on x86 at least the core code takes care of that. On arm, I'm
> not sure this just works, since you can't change the mode of the
> linear map. Instead you need specially allocated memory which is _not_
> in the linear map. I guess arm boxes with pcie slots aren't that
> common yet :-)

I was hoping to get some feedback on these cache mode flags.

These drivers use them:
   udl/udl_gem.c
   omapdrm/omap_gem.c
   msm/msm_gem.c
   etnaviv/etnaviv_gem.c

I don't know if they make sense or not, so any help is appreciated.

Noralf.

> -Daniel
>
>> Thanks,
>> Thomas
>>
>>
>> _______________________________________________
>> dri-devel mailing list
>> dri-devel@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>
>
Thomas Hellström (VMware) Sept. 14, 2018, 5:11 p.m. UTC | #4
On 09/14/2018 06:51 PM, Noralf Trønnes wrote:
>
> Den 14.09.2018 18.13, skrev Daniel Vetter:
>> On Fri, Sep 14, 2018 at 5:48 PM, Thomas Hellstrom 
>> <thomas@shipmail.org> wrote:
>>> Hi, Noralf,
>>>
>>> On 09/11/2018 02:43 PM, Noralf Trønnes wrote:
>>>> This patchset adds a library for shmem backed GEM objects and makes 
>>>> use
>>>> of it in tinydrm.
>>>>
>>>> When I made tinydrm I used the CMA helper because it was very easy to
>>>> use. July last year I learned that this limits which drivers to PRIME
>>>> import from, since CMA requires continuous memory. tinydrm drivers 
>>>> don't
>>>> require that. So I set out to change that looking first at shmem, but
>>>> that wasn't working since shmem didn't work with fbdev deferred I/O.
>>>> Then I did a vmalloc buffer attempt which worked with deferred I/O, 
>>>> but
>>>> maybe wouldn't be of so much use as a library for other drivers to 
>>>> use.
>>>> As my work to split out stuff from the CMA helper for shared use 
>>>> came to
>>>> an end, I had a generic fbdev emulation that uses a shadow buffer for
>>>> deferred I/O.
>>>> This means that I can now use shmem buffers after all.
>>>>
>>>> I have looked at the other drivers that use drm_gem_get_pages() and
>>>> several supports different cache modes so I've done that even though
>>>> tinydrm only uses the cached one.
>>> Out if interest, how can you use writecombine and uncached with shared
>>> memory?
>>> as typically the linear kernel map of the affected pages also needs
>>> changing?
>> I think on x86 at least the core code takes care of that. On arm, I'm
>> not sure this just works, since you can't change the mode of the
>> linear map. Instead you need specially allocated memory which is _not_
>> in the linear map. I guess arm boxes with pcie slots aren't that
>> common yet :-)
>
> I was hoping to get some feedback on these cache mode flags.
>
> These drivers use them:
>   udl/udl_gem.c
>   omapdrm/omap_gem.c
>   msm/msm_gem.c
>   etnaviv/etnaviv_gem.c
>
> I don't know if they make sense or not, so any help is appreciated.

It's possible, as Daniel says, that the X86 PAT system now automatically 
tracks all pte inserts and changes the linear kernel map accordingly. It 
certainly didn't use to do that, so for example TTM does that 
explicitly. And it's a very slow operation since it involves a global 
cache- and TLB flush across all cores. So if PAT is doing that on a 
per-page basis, it's probably bound to be very slow.

The concern with shmem (last time I looked which was a couple of years 
ago, i admit) was that shmem needs the linear kernel map to copy data in 
and out when swapping and on hibernate. If the above drivers that you 
mention don't use shmem, that's all fine, but the combination of shmem 
and special memory that is NOT mapped in the kernel memory does look a 
bit odd to me.

/Thomas


>
> Noralf.
>
>> -Daniel
>>
>>> Thanks,
>>> Thomas
>>>
>>>
>>> _______________________________________________
>>> dri-devel mailing list
>>> dri-devel@lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>>
>>
Noralf Trønnes Sept. 14, 2018, 6:09 p.m. UTC | #5
Den 14.09.2018 19.11, skrev Thomas Hellstrom:
> On 09/14/2018 06:51 PM, Noralf Trønnes wrote:
>>
>> Den 14.09.2018 18.13, skrev Daniel Vetter:
>>> On Fri, Sep 14, 2018 at 5:48 PM, Thomas Hellstrom 
>>> <thomas@shipmail.org> wrote:
>>>> Hi, Noralf,
>>>>
>>>> On 09/11/2018 02:43 PM, Noralf Trønnes wrote:
>>>>> This patchset adds a library for shmem backed GEM objects and 
>>>>> makes use
>>>>> of it in tinydrm.
>>>>>
>>>>> When I made tinydrm I used the CMA helper because it was very easy to
>>>>> use. July last year I learned that this limits which drivers to PRIME
>>>>> import from, since CMA requires continuous memory. tinydrm drivers 
>>>>> don't
>>>>> require that. So I set out to change that looking first at shmem, but
>>>>> that wasn't working since shmem didn't work with fbdev deferred I/O.
>>>>> Then I did a vmalloc buffer attempt which worked with deferred 
>>>>> I/O, but
>>>>> maybe wouldn't be of so much use as a library for other drivers to 
>>>>> use.
>>>>> As my work to split out stuff from the CMA helper for shared use 
>>>>> came to
>>>>> an end, I had a generic fbdev emulation that uses a shadow buffer for
>>>>> deferred I/O.
>>>>> This means that I can now use shmem buffers after all.
>>>>>
>>>>> I have looked at the other drivers that use drm_gem_get_pages() and
>>>>> several supports different cache modes so I've done that even though
>>>>> tinydrm only uses the cached one.
>>>> Out if interest, how can you use writecombine and uncached with shared
>>>> memory?
>>>> as typically the linear kernel map of the affected pages also needs
>>>> changing?
>>> I think on x86 at least the core code takes care of that. On arm, I'm
>>> not sure this just works, since you can't change the mode of the
>>> linear map. Instead you need specially allocated memory which is _not_
>>> in the linear map. I guess arm boxes with pcie slots aren't that
>>> common yet :-)
>>
>> I was hoping to get some feedback on these cache mode flags.
>>
>> These drivers use them:
>>   udl/udl_gem.c
>>   omapdrm/omap_gem.c
>>   msm/msm_gem.c
>>   etnaviv/etnaviv_gem.c
>>
>> I don't know if they make sense or not, so any help is appreciated.
>
> It's possible, as Daniel says, that the X86 PAT system now 
> automatically tracks all pte inserts and changes the linear kernel map 
> accordingly. It certainly didn't use to do that, so for example TTM 
> does that explicitly. And it's a very slow operation since it involves 
> a global cache- and TLB flush across all cores. So if PAT is doing 
> that on a per-page basis, it's probably bound to be very slow.
>
> The concern with shmem (last time I looked which was a couple of years 
> ago, i admit) was that shmem needs the linear kernel map to copy data 
> in and out when swapping and on hibernate. If the above drivers that 
> you mention don't use shmem, that's all fine, but the combination of 
> shmem and special memory that is NOT mapped in the kernel memory does 
> look a bit odd to me.
>

When I started writing this helper I first looked at udl which has
different cache modes, but only uses shmem. It does hardcode cache mode
to cached though. Based on what you say I'm starting to think that maybe
the udl GEM code was just copied from some other driver and not cleaned
up properly afterwards.

In addition to shmem, msm and etnaviv use vram and omap uses
dma_alloc_wc(). So the cache modes are for distinguishing the memory
types I suppose.

They all have this type of code in their *_mmap_obj() function (with the
same comment):

     if (msm_obj->flags & MSM_BO_WC) {
         vma->vm_page_prot = 
pgprot_writecombine(vm_get_page_prot(vma->vm_flags));
     } else if (msm_obj->flags & MSM_BO_UNCACHED) {
         vma->vm_page_prot = 
pgprot_noncached(vm_get_page_prot(vma->vm_flags));
     } else {
         /*
          * Shunt off cached objs to shmem file so they have their own
          * address_space (so unmap_mapping_range does what we want,
          * in particular in the case of mmap'd dmabufs)
          */
         fput(vma->vm_file);
         get_file(obj->filp);
         vma->vm_pgoff = 0;
         vma->vm_file  = obj->filp;

         vma->vm_page_prot = vm_get_page_prot(vma->vm_flags);
     }

So now it's starting to make more sense to me. It seems that shmem is the
cached mode and that WC and uncached is used for the other memory types.

If I've understood this correctly, it means that I can just drop the
cache modes.

Thanks,
Noralf.

> /Thomas
>
>
>>
>> Noralf.
>>
>>> -Daniel
>>>
>>>> Thanks,
>>>> Thomas
>>>>
>>>>
>>>> _______________________________________________
>>>> dri-devel mailing list
>>>> dri-devel@lists.freedesktop.org
>>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>>>
>>>
>
Thomas Hellström (VMware) Sept. 14, 2018, 8:13 p.m. UTC | #6
Hi
On 09/14/2018 08:09 PM, Noralf Trønnes wrote:
>
> Den 14.09.2018 19.11, skrev Thomas Hellstrom:
>> On 09/14/2018 06:51 PM, Noralf Trønnes wrote:
>>>
>>> Den 14.09.2018 18.13, skrev Daniel Vetter:
>>>> On Fri, Sep 14, 2018 at 5:48 PM, Thomas Hellstrom 
>>>> <thomas@shipmail.org> wrote:
>>>>> Hi, Noralf,
>>>>>
>>>>> On 09/11/2018 02:43 PM, Noralf Trønnes wrote:
>>>>>> This patchset adds a library for shmem backed GEM objects and 
>>>>>> makes use
>>>>>> of it in tinydrm.
>>>>>>
>>>>>> When I made tinydrm I used the CMA helper because it was very 
>>>>>> easy to
>>>>>> use. July last year I learned that this limits which drivers to 
>>>>>> PRIME
>>>>>> import from, since CMA requires continuous memory. tinydrm 
>>>>>> drivers don't
>>>>>> require that. So I set out to change that looking first at shmem, 
>>>>>> but
>>>>>> that wasn't working since shmem didn't work with fbdev deferred I/O.
>>>>>> Then I did a vmalloc buffer attempt which worked with deferred 
>>>>>> I/O, but
>>>>>> maybe wouldn't be of so much use as a library for other drivers 
>>>>>> to use.
>>>>>> As my work to split out stuff from the CMA helper for shared use 
>>>>>> came to
>>>>>> an end, I had a generic fbdev emulation that uses a shadow buffer 
>>>>>> for
>>>>>> deferred I/O.
>>>>>> This means that I can now use shmem buffers after all.
>>>>>>
>>>>>> I have looked at the other drivers that use drm_gem_get_pages() and
>>>>>> several supports different cache modes so I've done that even though
>>>>>> tinydrm only uses the cached one.
>>>>> Out if interest, how can you use writecombine and uncached with 
>>>>> shared
>>>>> memory?
>>>>> as typically the linear kernel map of the affected pages also needs
>>>>> changing?
>>>> I think on x86 at least the core code takes care of that. On arm, I'm
>>>> not sure this just works, since you can't change the mode of the
>>>> linear map. Instead you need specially allocated memory which is _not_
>>>> in the linear map. I guess arm boxes with pcie slots aren't that
>>>> common yet :-)
>>>
>>> I was hoping to get some feedback on these cache mode flags.
>>>
>>> These drivers use them:
>>>   udl/udl_gem.c
>>>   omapdrm/omap_gem.c
>>>   msm/msm_gem.c
>>>   etnaviv/etnaviv_gem.c
>>>
>>> I don't know if they make sense or not, so any help is appreciated.
>>
>> It's possible, as Daniel says, that the X86 PAT system now 
>> automatically tracks all pte inserts and changes the linear kernel 
>> map accordingly. It certainly didn't use to do that, so for example 
>> TTM does that explicitly. And it's a very slow operation since it 
>> involves a global cache- and TLB flush across all cores. So if PAT is 
>> doing that on a per-page basis, it's probably bound to be very slow.
>>
>> The concern with shmem (last time I looked which was a couple of 
>> years ago, i admit) was that shmem needs the linear kernel map to 
>> copy data in and out when swapping and on hibernate. If the above 
>> drivers that you mention don't use shmem, that's all fine, but the 
>> combination of shmem and special memory that is NOT mapped in the 
>> kernel memory does look a bit odd to me.
>>
>
> When I started writing this helper I first looked at udl which has
> different cache modes, but only uses shmem. It does hardcode cache mode
> to cached though. Based on what you say I'm starting to think that maybe
> the udl GEM code was just copied from some other driver and not cleaned
> up properly afterwards.
>
> In addition to shmem, msm and etnaviv use vram and omap uses
> dma_alloc_wc(). So the cache modes are for distinguishing the memory
> types I suppose.
>
> They all have this type of code in their *_mmap_obj() function (with the
> same comment):
>
>     if (msm_obj->flags & MSM_BO_WC) {
>         vma->vm_page_prot = 
> pgprot_writecombine(vm_get_page_prot(vma->vm_flags));
>     } else if (msm_obj->flags & MSM_BO_UNCACHED) {
>         vma->vm_page_prot = 
> pgprot_noncached(vm_get_page_prot(vma->vm_flags));
>     } else {
>         /*
>          * Shunt off cached objs to shmem file so they have their own
>          * address_space (so unmap_mapping_range does what we want,
>          * in particular in the case of mmap'd dmabufs)
>          */
>         fput(vma->vm_file);
>         get_file(obj->filp);
>         vma->vm_pgoff = 0;
>         vma->vm_file  = obj->filp;
>
>         vma->vm_page_prot = vm_get_page_prot(vma->vm_flags);
>     }
>
> So now it's starting to make more sense to me. It seems that shmem is the
> cached mode and that WC and uncached is used for the other memory types.
>
> If I've understood this correctly, it means that I can just drop the
> cache modes.
>

Yes, I think that might be a good idea, at least until there is a clear 
understanding how and if the linear kernel map is handled, and it's 
probably different on each architecture.  Best would of course be if we 
were allowed to set up different mappings to a page with conflicting 
cache modes...

/Thomas




> Thanks,
> Noralf.
>
>> /Thomas
>>
>>
>>>
>>> Noralf.
>>>
>>>> -Daniel
>>>>
>>>>> Thanks,
>>>>> Thomas
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> dri-devel mailing list
>>>>> dri-devel@lists.freedesktop.org
>>>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>>>>
>>>>
>>