mbox series

[RFC,v2,0/6] Introduce sparse DRM shmem object allocations

Message ID 20250326021433.772196-1-adrian.larumbe@collabora.com (mailing list archive)
Headers show
Series Introduce sparse DRM shmem object allocations | expand

Message

Adrián Larumbe March 26, 2025, 2:14 a.m. UTC
This patch series is a proposal for implementing sparse page allocations
for shmem objects. It was initially motivated by a kind of BO managed by
the Panfrost driver, the tiler heap, which grows on demand every time the
GPU faults on a virtual address within its drm_mm-managed ranged.

Because keeping a struct page pointer array that can describe the entire
virtual range is wasteful when only a few backing pages have been
allocated, at Collabora we thought a sparse allocation approach with
xarrays was a more efficient choice.

Since sparse and 'dense' DRM shmem objects must be managed slightly
differently, the API is expanded to allow client drivers to create sparse
objects and also to expand their page backing range, but everything else
should remain as transparent as possible and be handled from within the DRM
shmem system itself.

Discussion of previus revision can be found here:
https://lore.kernel.org/dri-devel/20250218232552.3450939-1-adrian.larumbe@collabora.com/

Changelog:
 v2:
  - Removed patch with helper for non-blocking shmem page allocations.
  - Moved page_array definitions away from scatterlist interface to hide
  them from consumers.
  - Refactored sg_alloc_append_table_from_pages() so that it now calls
  sg_alloc_append_table_from_page_array() to avoid code duplication.
  - Undid extension of __drm_gem_shmem_create() argument list so that a sparse
  shmem object is now fully defined in a parent function.
  - Moved check for absence of backing pages when putting an object into
  drm_gem_shmem_put_pages()
  - Added explanatory comments above DRM WARN'ings across yet unimplemented
  shmem code paths, like kernel vmap's and UM mappings of sparse objects
  - Created drm_gem helper for doing the actual sparse allocation, to keep
  the interface aligned with the existing one with regular shmem objects.
  - Split the body of drm_gem_shmem_get_sparse_pages_locked() into two separate
  functions, one which performs the actual page allocation, and another
  one that retrieves an sgtable.
  - Expanded the argument list of drm_gem_shmem_get_sparse_pages() and its
  children functions so that it takes an gfp mask, in the even that we would
  want to do non-blocking allocations, for instance like when we wish to
  avoid races with the shrinker memory reclaim path.
  - Created shmem helper that returns whether an shmem object has any backing pages.

TODO:
The following items need to be worked on, and will be the subject of a v3 of this RFC:

 - Handle the special case when some of the pages in a sparse allocation range are
   already present, rather than bailing out immediately.
 - Redefining panfrost_gem_object::sgts into an xarray or perhaps a sg_append_table
   to avoid memory waste in allocating more sgtable pointers than we could need.
 - Deciding on the rules for sparse shmem object's kmaps and UM maps.

Adrián Larumbe (6):
  lib/scatterlist.c: Support constructing sgt from page xarray
  drm/shmem: Introduce the notion of sparse objects
  drm/shmem: Implement sparse allocation of pages for shmem objects
  drm/panfrost: Use shmem sparse allocation for heap BOs
  drm/shmem: Add a helper to check object's page backing status
  drm/panfrost/panthor: Take sparse objects into account for fdinfo

 drivers/gpu/drm/drm_gem.c               | 117 +++++++++++
 drivers/gpu/drm/drm_gem_shmem_helper.c  | 264 +++++++++++++++++++++++-
 drivers/gpu/drm/panfrost/panfrost_gem.c |  14 +-
 drivers/gpu/drm/panfrost/panfrost_gem.h |   2 +-
 drivers/gpu/drm/panfrost/panfrost_mmu.c |  86 ++------
 drivers/gpu/drm/panthor/panthor_gem.c   |   2 +-
 include/drm/drm_gem.h                   |   6 +
 include/drm/drm_gem_shmem_helper.h      |  29 ++-
 include/linux/scatterlist.h             |  17 ++
 lib/scatterlist.c                       | 175 +++++++++++-----
 10 files changed, 579 insertions(+), 133 deletions(-)


base-commit: 2f9d51740cc30e0d2c8a23a55b1e20cf2513c250
--
2.48.1

Comments

Thomas Zimmermann March 31, 2025, 7:13 a.m. UTC | #1
Hi

Am 26.03.25 um 03:14 schrieb Adrián Larumbe:
> This patch series is a proposal for implementing sparse page allocations
> for shmem objects. It was initially motivated by a kind of BO managed by
> the Panfrost driver, the tiler heap, which grows on demand every time the
> GPU faults on a virtual address within its drm_mm-managed ranged.

I've looked at panfrost_gem.h and found that the driver's gem structure 
has grown quite a bit. It seems to have outgrown gem-shmem already.  I 
think you should consider pulling a copy of gem-shmem into the driver 
and building a dedicated memory manager on top.

Best regards
Thomas

>
> Because keeping a struct page pointer array that can describe the entire
> virtual range is wasteful when only a few backing pages have been
> allocated, at Collabora we thought a sparse allocation approach with
> xarrays was a more efficient choice.
>
> Since sparse and 'dense' DRM shmem objects must be managed slightly
> differently, the API is expanded to allow client drivers to create sparse
> objects and also to expand their page backing range, but everything else
> should remain as transparent as possible and be handled from within the DRM
> shmem system itself.
>
> Discussion of previus revision can be found here:
> https://lore.kernel.org/dri-devel/20250218232552.3450939-1-adrian.larumbe@collabora.com/
>
> Changelog:
>   v2:
>    - Removed patch with helper for non-blocking shmem page allocations.
>    - Moved page_array definitions away from scatterlist interface to hide
>    them from consumers.
>    - Refactored sg_alloc_append_table_from_pages() so that it now calls
>    sg_alloc_append_table_from_page_array() to avoid code duplication.
>    - Undid extension of __drm_gem_shmem_create() argument list so that a sparse
>    shmem object is now fully defined in a parent function.
>    - Moved check for absence of backing pages when putting an object into
>    drm_gem_shmem_put_pages()
>    - Added explanatory comments above DRM WARN'ings across yet unimplemented
>    shmem code paths, like kernel vmap's and UM mappings of sparse objects
>    - Created drm_gem helper for doing the actual sparse allocation, to keep
>    the interface aligned with the existing one with regular shmem objects.
>    - Split the body of drm_gem_shmem_get_sparse_pages_locked() into two separate
>    functions, one which performs the actual page allocation, and another
>    one that retrieves an sgtable.
>    - Expanded the argument list of drm_gem_shmem_get_sparse_pages() and its
>    children functions so that it takes an gfp mask, in the even that we would
>    want to do non-blocking allocations, for instance like when we wish to
>    avoid races with the shrinker memory reclaim path.
>    - Created shmem helper that returns whether an shmem object has any backing pages.
>
> TODO:
> The following items need to be worked on, and will be the subject of a v3 of this RFC:
>
>   - Handle the special case when some of the pages in a sparse allocation range are
>     already present, rather than bailing out immediately.
>   - Redefining panfrost_gem_object::sgts into an xarray or perhaps a sg_append_table
>     to avoid memory waste in allocating more sgtable pointers than we could need.
>   - Deciding on the rules for sparse shmem object's kmaps and UM maps.
>
> Adrián Larumbe (6):
>    lib/scatterlist.c: Support constructing sgt from page xarray
>    drm/shmem: Introduce the notion of sparse objects
>    drm/shmem: Implement sparse allocation of pages for shmem objects
>    drm/panfrost: Use shmem sparse allocation for heap BOs
>    drm/shmem: Add a helper to check object's page backing status
>    drm/panfrost/panthor: Take sparse objects into account for fdinfo
>
>   drivers/gpu/drm/drm_gem.c               | 117 +++++++++++
>   drivers/gpu/drm/drm_gem_shmem_helper.c  | 264 +++++++++++++++++++++++-
>   drivers/gpu/drm/panfrost/panfrost_gem.c |  14 +-
>   drivers/gpu/drm/panfrost/panfrost_gem.h |   2 +-
>   drivers/gpu/drm/panfrost/panfrost_mmu.c |  86 ++------
>   drivers/gpu/drm/panthor/panthor_gem.c   |   2 +-
>   include/drm/drm_gem.h                   |   6 +
>   include/drm/drm_gem_shmem_helper.h      |  29 ++-
>   include/linux/scatterlist.h             |  17 ++
>   lib/scatterlist.c                       | 175 +++++++++++-----
>   10 files changed, 579 insertions(+), 133 deletions(-)
>
>
> base-commit: 2f9d51740cc30e0d2c8a23a55b1e20cf2513c250
> --
> 2.48.1
Boris Brezillon March 31, 2025, 8:31 a.m. UTC | #2
Hi Thomas,

On Mon, 31 Mar 2025 09:13:59 +0200
Thomas Zimmermann <tzimmermann@suse.de> wrote:

> Hi
> 
> Am 26.03.25 um 03:14 schrieb Adrián Larumbe:
> > This patch series is a proposal for implementing sparse page allocations
> > for shmem objects. It was initially motivated by a kind of BO managed by
> > the Panfrost driver, the tiler heap, which grows on demand every time the
> > GPU faults on a virtual address within its drm_mm-managed ranged.  
> 
> I've looked at panfrost_gem.h and found that the driver's gem structure 
> has grown quite a bit. It seems to have outgrown gem-shmem already.  I 
> think you should consider pulling a copy of gem-shmem into the driver 
> and building a dedicated memory manager on top.

Actually, it's not just something we need for panfrost. I plan to use
the same non-blocking allocation mechanism for panthor's heap
chunks/buffers, and lima could use it for its heap buffers too. The
non-blocking page allocation is also something i915 has been
open-coding here [1], and I believe that some of this logic could
(and should IMHO) live in common code rather than each driver coming
with its own solution, thus increasing the risk of bugs/inconsistencies.
That's even more important if we provide a common gem-shmem shrinker
like Dmitry's has been proposing.

Best Regards,

Boris

[1]https://elixir.bootlin.com/linux/v6.13.7/source/drivers/gpu/drm/i915/gem/i915_gem_shmem.c#L89
Thomas Zimmermann March 31, 2025, 9:12 a.m. UTC | #3
Hi

Am 31.03.25 um 10:31 schrieb Boris Brezillon:
> Hi Thomas,
>
> On Mon, 31 Mar 2025 09:13:59 +0200
> Thomas Zimmermann <tzimmermann@suse.de> wrote:
>
>> Hi
>>
>> Am 26.03.25 um 03:14 schrieb Adrián Larumbe:
>>> This patch series is a proposal for implementing sparse page allocations
>>> for shmem objects. It was initially motivated by a kind of BO managed by
>>> the Panfrost driver, the tiler heap, which grows on demand every time the
>>> GPU faults on a virtual address within its drm_mm-managed ranged.
>> I've looked at panfrost_gem.h and found that the driver's gem structure
>> has grown quite a bit. It seems to have outgrown gem-shmem already.  I
>> think you should consider pulling a copy of gem-shmem into the driver
>> and building a dedicated memory manager on top.
> Actually, it's not just something we need for panfrost. I plan to use
> the same non-blocking allocation mechanism for panthor's heap
> chunks/buffers, and lima could use it for its heap buffers too. The
> non-blocking page allocation is also something i915 has been
> open-coding here [1], and I believe that some of this logic could
> (and should IMHO) live in common code rather than each driver coming
> with its own solution, thus increasing the risk of bugs/inconsistencies.
> That's even more important if we provide a common gem-shmem shrinker
> like Dmitry's has been proposing.

Thanks for the note.

Best regards
Thomas

>
> Best Regards,
>
> Boris
>
> [1]https://elixir.bootlin.com/linux/v6.13.7/source/drivers/gpu/drm/i915/gem/i915_gem_shmem.c#L89