Message ID | 20230109213809.418135-2-tjmercier@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Track exported dma-buffers with memcg | expand |
On Mon 09-01-23 21:38:04, T.J. Mercier wrote: > When a buffer is exported to userspace, use memcg to attribute the > buffer to the allocating cgroup until all buffer references are > released. > > Unlike the dmabuf sysfs stats implementation, this memcg accounting > avoids contention over the kernfs_rwsem incurred when creating or > removing nodes. I am not familiar with dmabuf infrastructure so please bear with me. AFAIU this patch adds a dmabuf specific counter to find out the amount of dmabuf memory used. But I do not see any actual charging implemented for that memory. I have looked at two random users of dma_buf_export cma_heap_allocate and it allocates pages to back the dmabuf (AFAIU) by cma_alloc which doesn't account to memcg, system_heap_allocate uses alloc_largest_available which relies on order_flags which doesn't seem to ever use __GFP_ACCOUNT. This would mean that the counter doesn't represent any actual memory reflected in the overall memory consumption of a memcg. I believe this is rather unexpected and confusing behavior. While some counters overlap and their sum would exceed the charged memory we do not have any that doesn't correspond to any memory (at least not for non-root memcgs).
On Tue, Jan 10, 2023 at 12:58 AM Michal Hocko <mhocko@suse.com> wrote: > > On Mon 09-01-23 21:38:04, T.J. Mercier wrote: > > When a buffer is exported to userspace, use memcg to attribute the > > buffer to the allocating cgroup until all buffer references are > > released. > > > > Unlike the dmabuf sysfs stats implementation, this memcg accounting > > avoids contention over the kernfs_rwsem incurred when creating or > > removing nodes. > > I am not familiar with dmabuf infrastructure so please bear with me. > AFAIU this patch adds a dmabuf specific counter to find out the amount > of dmabuf memory used. But I do not see any actual charging implemented > for that memory. > > I have looked at two random users of dma_buf_export cma_heap_allocate > and it allocates pages to back the dmabuf (AFAIU) by cma_alloc > which doesn't account to memcg, system_heap_allocate uses > alloc_largest_available which relies on order_flags which doesn't seem > to ever use __GFP_ACCOUNT. > > This would mean that the counter doesn't represent any actual memory > reflected in the overall memory consumption of a memcg. I believe this > is rather unexpected and confusing behavior. While some counters > overlap and their sum would exceed the charged memory we do not have any > that doesn't correspond to any memory (at least not for non-root memcgs). > > -- > Michal Hocko > SUSE Labs Thank you, that behavior is not intentional. I'm not looking at the overall memcg charge yet otherwise I would have noticed this. I think I understand what's needed for the charging part, but Shakeel mentioned some additional work for "reclaim, OOM and charge context and failure cases" on the cover letter which I need to look into.
diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst index c8ae7c897f14..538ae22bc514 100644 --- a/Documentation/admin-guide/cgroup-v2.rst +++ b/Documentation/admin-guide/cgroup-v2.rst @@ -1455,6 +1455,10 @@ PAGE_SIZE multiple when read back. Amount of memory used for storing in-kernel data structures. + dmabuf (npn) + Amount of memory used for exported DMA buffers allocated by the cgroup. + Stays with the allocating cgroup regardless of how the buffer is shared. + workingset_refault_anon Number of refaults of previously evicted anonymous pages. diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c index e6528767efc7..ac45dd101c4d 100644 --- a/drivers/dma-buf/dma-buf.c +++ b/drivers/dma-buf/dma-buf.c @@ -75,6 +75,8 @@ static void dma_buf_release(struct dentry *dentry) */ BUG_ON(dmabuf->cb_in.active || dmabuf->cb_out.active); + mod_memcg_state(dmabuf->memcg, MEMCG_DMABUF, -dmabuf->size); + mem_cgroup_put(dmabuf->memcg); dma_buf_stats_teardown(dmabuf); dmabuf->ops->release(dmabuf); @@ -673,6 +675,9 @@ struct dma_buf *dma_buf_export(const struct dma_buf_export_info *exp_info) if (ret) goto err_dmabuf; + dmabuf->memcg = get_mem_cgroup_from_mm(current->mm); + mod_memcg_state(dmabuf->memcg, MEMCG_DMABUF, dmabuf->size); + file->private_data = dmabuf; file->f_path.dentry->d_fsdata = dmabuf; dmabuf->file = file; diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h index 6fa8d4e29719..1f0ffb8e4bf5 100644 --- a/include/linux/dma-buf.h +++ b/include/linux/dma-buf.h @@ -22,6 +22,7 @@ #include <linux/fs.h> #include <linux/dma-fence.h> #include <linux/wait.h> +#include <linux/memcontrol.h> struct device; struct dma_buf; @@ -446,6 +447,8 @@ struct dma_buf { struct dma_buf *dmabuf; } *sysfs_entry; #endif + /* The cgroup to which this buffer is currently attributed */ + struct mem_cgroup *memcg; }; /** diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index d3c8203cab6c..1c1da2da20a6 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -37,6 +37,7 @@ enum memcg_stat_item { MEMCG_KMEM, MEMCG_ZSWAP_B, MEMCG_ZSWAPPED, + MEMCG_DMABUF, MEMCG_NR_STAT, }; diff --git a/mm/memcontrol.c b/mm/memcontrol.c index ab457f0394ab..680189bec7e0 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1502,6 +1502,7 @@ static const struct memory_stat memory_stats[] = { { "unevictable", NR_UNEVICTABLE }, { "slab_reclaimable", NR_SLAB_RECLAIMABLE_B }, { "slab_unreclaimable", NR_SLAB_UNRECLAIMABLE_B }, + { "dmabuf", MEMCG_DMABUF }, /* The memory events */ { "workingset_refault_anon", WORKINGSET_REFAULT_ANON }, @@ -1519,6 +1520,7 @@ static int memcg_page_state_unit(int item) switch (item) { case MEMCG_PERCPU_B: case MEMCG_ZSWAP_B: + case MEMCG_DMABUF: case NR_SLAB_RECLAIMABLE_B: case NR_SLAB_UNRECLAIMABLE_B: case WORKINGSET_REFAULT_ANON: @@ -4042,6 +4044,7 @@ static const unsigned int memcg1_stats[] = { WORKINGSET_REFAULT_ANON, WORKINGSET_REFAULT_FILE, MEMCG_SWAP, + MEMCG_DMABUF, }; static const char *const memcg1_stat_names[] = { @@ -4057,6 +4060,7 @@ static const char *const memcg1_stat_names[] = { "workingset_refault_anon", "workingset_refault_file", "swap", + "dmabuf", }; /* Universal VM events cgroup1 shows, original sort order */
When a buffer is exported to userspace, use memcg to attribute the buffer to the allocating cgroup until all buffer references are released. Unlike the dmabuf sysfs stats implementation, this memcg accounting avoids contention over the kernfs_rwsem incurred when creating or removing nodes. Signed-off-by: T.J. Mercier <tjmercier@google.com> --- Documentation/admin-guide/cgroup-v2.rst | 4 ++++ drivers/dma-buf/dma-buf.c | 5 +++++ include/linux/dma-buf.h | 3 +++ include/linux/memcontrol.h | 1 + mm/memcontrol.c | 4 ++++ 5 files changed, 17 insertions(+)