mbox series

[RFC,0/5] GEM buffer memory tracking

Message ID 20220909111640.3789791-1-l.stach@pengutronix.de (mailing list archive)
Headers show
Series GEM buffer memory tracking | expand

Message

Lucas Stach Sept. 9, 2022, 11:16 a.m. UTC
Hi MM and DRM people,

during the discussions about per-file OOM badness [1] it repeatedly came up
that it should be possible to simply track the DRM GEM memory usage by some
new MM counters.

The basic problem statement is as follows: in the DRM subsystem drivers can
allocate buffer aka. GEM objects on behalf of a userspace process. In many
cases those buffers behave just like anonymous memory, but they may be used
only by the devices driven by the DRM drivers. As the buffers can be quite
large (multi-MB is the norm, rather than the exception) userspace will not
map/fault them into the process address space when it doesn't need access to
the content of the buffers. Thus the memory used by those buffers is not
accounted to any process and evades visibility by the usual userspace tools
and the OOM handling.

This series tries to remedy this situation by making such memory visible
by accounting it exclusively to the process that created the GEM object.
For now it only hooks up the tracking to the CMA helpers and the etnaviv
drivers, which was enough for me to prove the concept and see it actually
working, other drivers could follow if the proposal sounds sane.

Known shortcomings of this very simplistic implementation:

1. GEM objects can be shared between processes by exporting/importing them
as dma-bufs. When they are shared between multiple processes, killing the
process that got the memory accounted will not actually free the memory, as
the object is kept alive by the sharing process.

2. It currently only accounts the full size of them GEM object, more advanced
devices/drivers may only sparsely populate the backing storage of the object
as needed. This could be solved by having more granular accounting.

I would like to invite everyone to poke holes into this proposal to see if
this might get us on the right trajectory to finally track GEM memory usage
or if it (again) falls short and doesn't satisfy the requirements we have
for graphics memory tracking.

Regards,
Lucas

[1] https://lore.kernel.org/linux-mm/20220531100007.174649-1-christian.koenig@amd.com/

Lucas Stach (5):
  mm: add MM_DRIVERPAGES
  drm/gem: track mm struct of allocating process in gem object
  drm/gem: add functions to account GEM object memory usage
  drm/cma-helper: account memory used by CMA GEM objects
  drm/etnaviv: account memory used by GEM buffers

 drivers/gpu/drm/drm_gem.c             | 42 +++++++++++++++++++++++++++
 drivers/gpu/drm/drm_gem_cma_helper.c  |  4 +++
 drivers/gpu/drm/etnaviv/etnaviv_gem.c |  3 ++
 fs/proc/task_mmu.c                    |  6 ++--
 include/drm/drm_gem.h                 | 15 ++++++++++
 include/linux/mm.h                    |  3 +-
 include/linux/mm_types_task.h         |  1 +
 kernel/fork.c                         |  1 +
 8 files changed, 72 insertions(+), 3 deletions(-)

Comments

Christian König Sept. 9, 2022, 11:32 a.m. UTC | #1
Am 09.09.22 um 13:16 schrieb Lucas Stach:
> Hi MM and DRM people,
>
> during the discussions about per-file OOM badness [1] it repeatedly came up
> that it should be possible to simply track the DRM GEM memory usage by some
> new MM counters.
>
> The basic problem statement is as follows: in the DRM subsystem drivers can
> allocate buffer aka. GEM objects on behalf of a userspace process. In many
> cases those buffers behave just like anonymous memory, but they may be used
> only by the devices driven by the DRM drivers. As the buffers can be quite
> large (multi-MB is the norm, rather than the exception) userspace will not
> map/fault them into the process address space when it doesn't need access to
> the content of the buffers. Thus the memory used by those buffers is not
> accounted to any process and evades visibility by the usual userspace tools
> and the OOM handling.
>
> This series tries to remedy this situation by making such memory visible
> by accounting it exclusively to the process that created the GEM object.
> For now it only hooks up the tracking to the CMA helpers and the etnaviv
> drivers, which was enough for me to prove the concept and see it actually
> working, other drivers could follow if the proposal sounds sane.
>
> Known shortcomings of this very simplistic implementation:
>
> 1. GEM objects can be shared between processes by exporting/importing them
> as dma-bufs. When they are shared between multiple processes, killing the
> process that got the memory accounted will not actually free the memory, as
> the object is kept alive by the sharing process.
>
> 2. It currently only accounts the full size of them GEM object, more advanced
> devices/drivers may only sparsely populate the backing storage of the object
> as needed. This could be solved by having more granular accounting.
>
> I would like to invite everyone to poke holes into this proposal to see if
> this might get us on the right trajectory to finally track GEM memory usage
> or if it (again) falls short and doesn't satisfy the requirements we have
> for graphics memory tracking.

Good to see other looking into this problem as well since I didn't had 
time for it recently.

I've tried this approach as well, but was quickly shot down by the 
forking behavior of the core kernel.

The problem is that the MM counters get copied over to child processes 
and because of that become imbalanced when this child process now 
terminates.

What you could do is to change the forking behavior for MM_DRIVERPAGES 
so that it always stays with the process which has initially allocated 
the memory and never leaks to children.

Apart from that I suggest to rename it since the shmemfd and a few other 
implementations have pretty much the same problem.

Regards,
Christian.

>
> Regards,
> Lucas
>
> [1] https://lore.kernel.org/linux-mm/20220531100007.174649-1-christian.koenig@amd.com/
>
> Lucas Stach (5):
>    mm: add MM_DRIVERPAGES
>    drm/gem: track mm struct of allocating process in gem object
>    drm/gem: add functions to account GEM object memory usage
>    drm/cma-helper: account memory used by CMA GEM objects
>    drm/etnaviv: account memory used by GEM buffers
>
>   drivers/gpu/drm/drm_gem.c             | 42 +++++++++++++++++++++++++++
>   drivers/gpu/drm/drm_gem_cma_helper.c  |  4 +++
>   drivers/gpu/drm/etnaviv/etnaviv_gem.c |  3 ++
>   fs/proc/task_mmu.c                    |  6 ++--
>   include/drm/drm_gem.h                 | 15 ++++++++++
>   include/linux/mm.h                    |  3 +-
>   include/linux/mm_types_task.h         |  1 +
>   kernel/fork.c                         |  1 +
>   8 files changed, 72 insertions(+), 3 deletions(-)
>