Message ID | alpine.DEB.2.23.453.2007161357490.3209847@chino.kir.corp.google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | mm, memcg: provide an anon_reclaimable stat | expand |
On Thu, Jul 16, 2020 at 1:58 PM David Rientjes <rientjes@google.com> wrote: > > Userspace can lack insight into the amount of memory that can be reclaimed > from a memcg based on values from memory.stat. Two specific examples: > > - Lazy freeable memory (MADV_FREE) that are clean anonymous pages on the > inactive file LRU that can be quickly reclaimed under memory pressure > but otherwise shows up as mapped anon in memory.stat, and > > - Memory on deferred split queues (thp) that are compound pages that can > be split and uncharged from the memcg under memory pressure, but > otherwise shows up as charged anon LRU memory in memory.stat. > > Both of this anonymous usage is also charged to memory.current. > > Userspace can currently derive this information but it depends on kernel > implementation details for how this memory is handled for the purposes of > reclaim (anon on inactive file LRU or unmapped anon on the LRU). > > For the purposes of writing portable userspace code that does not need to > have insight into the kernel implementation for reclaimable memory, this > exports a stat that reveals the amount of anonymous memory that can be > reclaimed and uncharged from the memcg to start new applications. > > As the kernel implementation evolves for memory that can be reclaimed > under memory pressure, this stat can be kept consistent. > > Signed-off-by: David Rientjes <rientjes@google.com> > --- > Documentation/admin-guide/cgroup-v2.rst | 6 +++++ > mm/memcontrol.c | 31 +++++++++++++++++++++++++ > 2 files changed, 37 insertions(+) > > diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst > --- a/Documentation/admin-guide/cgroup-v2.rst > +++ b/Documentation/admin-guide/cgroup-v2.rst > @@ -1296,6 +1296,12 @@ PAGE_SIZE multiple when read back. > Amount of memory used in anonymous mappings backed by > transparent hugepages > > + anon_reclaimable > + The amount of charged anonymous memory that can be reclaimed > + under memory pressure without swap. This currently includes > + lazy freeable memory (MADV_FREE) and compound pages that can be > + split and uncharged. > + > inactive_anon, active_anon, inactive_file, active_file, unevictable > Amount of memory, swap-backed and filesystem-backed, > on the internal memory management lists used by the > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -1350,6 +1350,32 @@ static bool mem_cgroup_wait_acct_move(struct mem_cgroup *memcg) > return false; > } > > +/* > + * Returns the amount of anon memory that is charged to the memcg that is > + * reclaimable under memory pressure without swap, in pages. > + */ > +static unsigned long memcg_anon_reclaimable(struct mem_cgroup *memcg) > +{ > + long deferred, lazyfree; > + > + /* > + * Deferred pages are charged anonymous pages that are on the LRU but > + * are unmapped. These compound pages are split under memory pressure. > + */ > + deferred = max_t(long, memcg_page_state(memcg, NR_ACTIVE_ANON) + > + memcg_page_state(memcg, NR_INACTIVE_ANON) - > + memcg_page_state(memcg, NR_ANON_MAPPED), 0); Please note that the NR_ANON_MAPPED does not include tmpfs memory but NR_[IN]ACTIVE_ANON does include the tmpfs. > + /* > + * Lazyfree pages are charged clean anonymous pages that are on the file > + * LRU and can be reclaimed under memory pressure. > + */ > + lazyfree = max_t(long, memcg_page_state(memcg, NR_ACTIVE_FILE) + > + memcg_page_state(memcg, NR_INACTIVE_FILE) - > + memcg_page_state(memcg, NR_FILE_PAGES), 0); Similarly NR_FILE_PAGES includes tmpfs memory but NR_[IN]ACTIVE_FILE does not.
On Thu, 16 Jul 2020, Shakeel Butt wrote: > > Userspace can lack insight into the amount of memory that can be reclaimed > > from a memcg based on values from memory.stat. Two specific examples: > > > > - Lazy freeable memory (MADV_FREE) that are clean anonymous pages on the > > inactive file LRU that can be quickly reclaimed under memory pressure > > but otherwise shows up as mapped anon in memory.stat, and > > > > - Memory on deferred split queues (thp) that are compound pages that can > > be split and uncharged from the memcg under memory pressure, but > > otherwise shows up as charged anon LRU memory in memory.stat. > > > > Both of this anonymous usage is also charged to memory.current. > > > > Userspace can currently derive this information but it depends on kernel > > implementation details for how this memory is handled for the purposes of > > reclaim (anon on inactive file LRU or unmapped anon on the LRU). > > > > For the purposes of writing portable userspace code that does not need to > > have insight into the kernel implementation for reclaimable memory, this > > exports a stat that reveals the amount of anonymous memory that can be > > reclaimed and uncharged from the memcg to start new applications. > > > > As the kernel implementation evolves for memory that can be reclaimed > > under memory pressure, this stat can be kept consistent. > > > > Signed-off-by: David Rientjes <rientjes@google.com> > > --- > > Documentation/admin-guide/cgroup-v2.rst | 6 +++++ > > mm/memcontrol.c | 31 +++++++++++++++++++++++++ > > 2 files changed, 37 insertions(+) > > > > diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst > > --- a/Documentation/admin-guide/cgroup-v2.rst > > +++ b/Documentation/admin-guide/cgroup-v2.rst > > @@ -1296,6 +1296,12 @@ PAGE_SIZE multiple when read back. > > Amount of memory used in anonymous mappings backed by > > transparent hugepages > > > > + anon_reclaimable > > + The amount of charged anonymous memory that can be reclaimed > > + under memory pressure without swap. This currently includes > > + lazy freeable memory (MADV_FREE) and compound pages that can be > > + split and uncharged. > > + > > inactive_anon, active_anon, inactive_file, active_file, unevictable > > Amount of memory, swap-backed and filesystem-backed, > > on the internal memory management lists used by the > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > > --- a/mm/memcontrol.c > > +++ b/mm/memcontrol.c > > @@ -1350,6 +1350,32 @@ static bool mem_cgroup_wait_acct_move(struct mem_cgroup *memcg) > > return false; > > } > > > > +/* > > + * Returns the amount of anon memory that is charged to the memcg that is > > + * reclaimable under memory pressure without swap, in pages. > > + */ > > +static unsigned long memcg_anon_reclaimable(struct mem_cgroup *memcg) > > +{ > > + long deferred, lazyfree; > > + > > + /* > > + * Deferred pages are charged anonymous pages that are on the LRU but > > + * are unmapped. These compound pages are split under memory pressure. > > + */ > > + deferred = max_t(long, memcg_page_state(memcg, NR_ACTIVE_ANON) + > > + memcg_page_state(memcg, NR_INACTIVE_ANON) - > > + memcg_page_state(memcg, NR_ANON_MAPPED), 0); > > Please note that the NR_ANON_MAPPED does not include tmpfs memory but > NR_[IN]ACTIVE_ANON does include the tmpfs. > > > + /* > > + * Lazyfree pages are charged clean anonymous pages that are on the file > > + * LRU and can be reclaimed under memory pressure. > > + */ > > + lazyfree = max_t(long, memcg_page_state(memcg, NR_ACTIVE_FILE) + > > + memcg_page_state(memcg, NR_INACTIVE_FILE) - > > + memcg_page_state(memcg, NR_FILE_PAGES), 0); > > Similarly NR_FILE_PAGES includes tmpfs memory but NR_[IN]ACTIVE_FILE does not. > Ah, so this adds to the motivation of providing the anon_reclaimable stat because the calculation becomes even more convoluted and completely based on the kernel implementation details for both lazyfree memory and deferred split queues. Did you have a calculation in mind for memcg_anon_reclaimable()?
On Thu, Jul 16, 2020 at 2:28 PM David Rientjes <rientjes@google.com> wrote: > > On Thu, 16 Jul 2020, Shakeel Butt wrote: > > > > Userspace can lack insight into the amount of memory that can be reclaimed > > > from a memcg based on values from memory.stat. Two specific examples: > > > > > > - Lazy freeable memory (MADV_FREE) that are clean anonymous pages on the > > > inactive file LRU that can be quickly reclaimed under memory pressure > > > but otherwise shows up as mapped anon in memory.stat, and > > > > > > - Memory on deferred split queues (thp) that are compound pages that can > > > be split and uncharged from the memcg under memory pressure, but > > > otherwise shows up as charged anon LRU memory in memory.stat. > > > > > > Both of this anonymous usage is also charged to memory.current. > > > > > > Userspace can currently derive this information but it depends on kernel > > > implementation details for how this memory is handled for the purposes of > > > reclaim (anon on inactive file LRU or unmapped anon on the LRU). > > > > > > For the purposes of writing portable userspace code that does not need to > > > have insight into the kernel implementation for reclaimable memory, this > > > exports a stat that reveals the amount of anonymous memory that can be > > > reclaimed and uncharged from the memcg to start new applications. > > > > > > As the kernel implementation evolves for memory that can be reclaimed > > > under memory pressure, this stat can be kept consistent. > > > > > > Signed-off-by: David Rientjes <rientjes@google.com> > > > --- > > > Documentation/admin-guide/cgroup-v2.rst | 6 +++++ > > > mm/memcontrol.c | 31 +++++++++++++++++++++++++ > > > 2 files changed, 37 insertions(+) > > > > > > diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst > > > --- a/Documentation/admin-guide/cgroup-v2.rst > > > +++ b/Documentation/admin-guide/cgroup-v2.rst > > > @@ -1296,6 +1296,12 @@ PAGE_SIZE multiple when read back. > > > Amount of memory used in anonymous mappings backed by > > > transparent hugepages > > > > > > + anon_reclaimable > > > + The amount of charged anonymous memory that can be reclaimed > > > + under memory pressure without swap. This currently includes > > > + lazy freeable memory (MADV_FREE) and compound pages that can be > > > + split and uncharged. > > > + > > > inactive_anon, active_anon, inactive_file, active_file, unevictable > > > Amount of memory, swap-backed and filesystem-backed, > > > on the internal memory management lists used by the > > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > > > --- a/mm/memcontrol.c > > > +++ b/mm/memcontrol.c > > > @@ -1350,6 +1350,32 @@ static bool mem_cgroup_wait_acct_move(struct mem_cgroup *memcg) > > > return false; > > > } > > > > > > +/* > > > + * Returns the amount of anon memory that is charged to the memcg that is > > > + * reclaimable under memory pressure without swap, in pages. > > > + */ > > > +static unsigned long memcg_anon_reclaimable(struct mem_cgroup *memcg) > > > +{ > > > + long deferred, lazyfree; > > > + > > > + /* > > > + * Deferred pages are charged anonymous pages that are on the LRU but > > > + * are unmapped. These compound pages are split under memory pressure. > > > + */ > > > + deferred = max_t(long, memcg_page_state(memcg, NR_ACTIVE_ANON) + > > > + memcg_page_state(memcg, NR_INACTIVE_ANON) - > > > + memcg_page_state(memcg, NR_ANON_MAPPED), 0); > > > > Please note that the NR_ANON_MAPPED does not include tmpfs memory but > > NR_[IN]ACTIVE_ANON does include the tmpfs. > > > > > + /* > > > + * Lazyfree pages are charged clean anonymous pages that are on the file > > > + * LRU and can be reclaimed under memory pressure. > > > + */ > > > + lazyfree = max_t(long, memcg_page_state(memcg, NR_ACTIVE_FILE) + > > > + memcg_page_state(memcg, NR_INACTIVE_FILE) - > > > + memcg_page_state(memcg, NR_FILE_PAGES), 0); > > > > Similarly NR_FILE_PAGES includes tmpfs memory but NR_[IN]ACTIVE_FILE does not. > > > > Ah, so this adds to the motivation of providing the anon_reclaimable stat > because the calculation becomes even more convoluted and completely based > on the kernel implementation details for both lazyfree memory and deferred > split queues. Yes, I agree. > Did you have a calculation in mind for > memcg_anon_reclaimable()? For deferred, "memcg->deferred_split_queue.split_queue_len" should be usable. For lazyfree, NR_ACTIVE_FILE + NR_INACTIVE_FILE + NR_SHMEM - NR_FILE_PAGES seems like the right formula.
On Thu 16-07-20 13:58:19, David Rientjes wrote: > Userspace can lack insight into the amount of memory that can be reclaimed > from a memcg based on values from memory.stat. Two specific examples: > > - Lazy freeable memory (MADV_FREE) that are clean anonymous pages on the > inactive file LRU that can be quickly reclaimed under memory pressure > but otherwise shows up as mapped anon in memory.stat, and > > - Memory on deferred split queues (thp) that are compound pages that can > be split and uncharged from the memcg under memory pressure, but > otherwise shows up as charged anon LRU memory in memory.stat. > > Both of this anonymous usage is also charged to memory.current. > > Userspace can currently derive this information but it depends on kernel > implementation details for how this memory is handled for the purposes of > reclaim (anon on inactive file LRU or unmapped anon on the LRU). > > For the purposes of writing portable userspace code that does not need to > have insight into the kernel implementation for reclaimable memory, this > exports a stat that reveals the amount of anonymous memory that can be > reclaimed and uncharged from the memcg to start new applications. > > As the kernel implementation evolves for memory that can be reclaimed > under memory pressure, this stat can be kept consistent. Please be much more specific about the expected usage. You have mentioned something in the email thread but this really belongs to the changelog. Why is reclaimable anonymous memory without any swap any special, say from any other clean and easily reclaimable caches? What if there is a swap available? > Signed-off-by: David Rientjes <rientjes@google.com> > --- > Documentation/admin-guide/cgroup-v2.rst | 6 +++++ > mm/memcontrol.c | 31 +++++++++++++++++++++++++ > 2 files changed, 37 insertions(+) > > diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst > --- a/Documentation/admin-guide/cgroup-v2.rst > +++ b/Documentation/admin-guide/cgroup-v2.rst > @@ -1296,6 +1296,12 @@ PAGE_SIZE multiple when read back. > Amount of memory used in anonymous mappings backed by > transparent hugepages > > + anon_reclaimable > + The amount of charged anonymous memory that can be reclaimed > + under memory pressure without swap. This currently includes > + lazy freeable memory (MADV_FREE) and compound pages that can be > + split and uncharged. > + > inactive_anon, active_anon, inactive_file, active_file, unevictable > Amount of memory, swap-backed and filesystem-backed, > on the internal memory management lists used by the > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -1350,6 +1350,32 @@ static bool mem_cgroup_wait_acct_move(struct mem_cgroup *memcg) > return false; > } > > +/* > + * Returns the amount of anon memory that is charged to the memcg that is > + * reclaimable under memory pressure without swap, in pages. > + */ > +static unsigned long memcg_anon_reclaimable(struct mem_cgroup *memcg) > +{ > + long deferred, lazyfree; > + > + /* > + * Deferred pages are charged anonymous pages that are on the LRU but > + * are unmapped. These compound pages are split under memory pressure. > + */ > + deferred = max_t(long, memcg_page_state(memcg, NR_ACTIVE_ANON) + > + memcg_page_state(memcg, NR_INACTIVE_ANON) - > + memcg_page_state(memcg, NR_ANON_MAPPED), 0); > + /* > + * Lazyfree pages are charged clean anonymous pages that are on the file > + * LRU and can be reclaimed under memory pressure. > + */ > + lazyfree = max_t(long, memcg_page_state(memcg, NR_ACTIVE_FILE) + > + memcg_page_state(memcg, NR_INACTIVE_FILE) - > + memcg_page_state(memcg, NR_FILE_PAGES), 0); > + > + return deferred + lazyfree; > +} > + > static char *memory_stat_format(struct mem_cgroup *memcg) > { > struct seq_buf s; > @@ -1363,6 +1389,9 @@ static char *memory_stat_format(struct mem_cgroup *memcg) > * Provide statistics on the state of the memory subsystem as > * well as cumulative event counters that show past behavior. > * > + * All values in this buffer are read individually, so no implied > + * consistency amongst them. > + * > * This list is ordered following a combination of these gradients: > * 1) generic big picture -> specifics and details > * 2) reflecting userspace activity -> reflecting kernel heuristics > @@ -1405,6 +1434,8 @@ static char *memory_stat_format(struct mem_cgroup *memcg) > (u64)memcg_page_state(memcg, NR_ANON_THPS) * > HPAGE_PMD_SIZE); > #endif > + seq_buf_printf(&s, "anon_reclaimable %llu\n", > + (u64)memcg_anon_reclaimable(memcg) * PAGE_SIZE); > > for (i = 0; i < NR_LRU_LISTS; i++) > seq_buf_printf(&s, "%s %llu\n", lru_list_name(i),
On Thu, Jul 16, 2020 at 01:58:19PM -0700, David Rientjes wrote: > @@ -1350,6 +1350,32 @@ static bool mem_cgroup_wait_acct_move(struct mem_cgroup *memcg) > return false; > } > > +/* > + * Returns the amount of anon memory that is charged to the memcg that is > + * reclaimable under memory pressure without swap, in pages. > + */ > +static unsigned long memcg_anon_reclaimable(struct mem_cgroup *memcg) > +{ > + long deferred, lazyfree; > + > + /* > + * Deferred pages are charged anonymous pages that are on the LRU but > + * are unmapped. These compound pages are split under memory pressure. > + */ > + deferred = max_t(long, memcg_page_state(memcg, NR_ACTIVE_ANON) + > + memcg_page_state(memcg, NR_INACTIVE_ANON) - > + memcg_page_state(memcg, NR_ANON_MAPPED), 0); > + /* > + * Lazyfree pages are charged clean anonymous pages that are on the file > + * LRU and can be reclaimed under memory pressure. > + */ > + lazyfree = max_t(long, memcg_page_state(memcg, NR_ACTIVE_FILE) + > + memcg_page_state(memcg, NR_INACTIVE_FILE) - > + memcg_page_state(memcg, NR_FILE_PAGES), 0); Unfortunately, we don't know if these have been reused after the madvise until we actually do the rmap walk in page reclaim. All of these could have dirty ptes and require swapout after all. The MADV_FREE tradeoff was that the freed pages can get reused by userspace without another context switch and tlb flush in the common case, by exploiting the fact that the MMU sets the dirty bit for us. The downside is that the kernel doesn't know what state these pages are in until it takes a close-up look at them one by one.
diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst --- a/Documentation/admin-guide/cgroup-v2.rst +++ b/Documentation/admin-guide/cgroup-v2.rst @@ -1296,6 +1296,12 @@ PAGE_SIZE multiple when read back. Amount of memory used in anonymous mappings backed by transparent hugepages + anon_reclaimable + The amount of charged anonymous memory that can be reclaimed + under memory pressure without swap. This currently includes + lazy freeable memory (MADV_FREE) and compound pages that can be + split and uncharged. + inactive_anon, active_anon, inactive_file, active_file, unevictable Amount of memory, swap-backed and filesystem-backed, on the internal memory management lists used by the diff --git a/mm/memcontrol.c b/mm/memcontrol.c --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1350,6 +1350,32 @@ static bool mem_cgroup_wait_acct_move(struct mem_cgroup *memcg) return false; } +/* + * Returns the amount of anon memory that is charged to the memcg that is + * reclaimable under memory pressure without swap, in pages. + */ +static unsigned long memcg_anon_reclaimable(struct mem_cgroup *memcg) +{ + long deferred, lazyfree; + + /* + * Deferred pages are charged anonymous pages that are on the LRU but + * are unmapped. These compound pages are split under memory pressure. + */ + deferred = max_t(long, memcg_page_state(memcg, NR_ACTIVE_ANON) + + memcg_page_state(memcg, NR_INACTIVE_ANON) - + memcg_page_state(memcg, NR_ANON_MAPPED), 0); + /* + * Lazyfree pages are charged clean anonymous pages that are on the file + * LRU and can be reclaimed under memory pressure. + */ + lazyfree = max_t(long, memcg_page_state(memcg, NR_ACTIVE_FILE) + + memcg_page_state(memcg, NR_INACTIVE_FILE) - + memcg_page_state(memcg, NR_FILE_PAGES), 0); + + return deferred + lazyfree; +} + static char *memory_stat_format(struct mem_cgroup *memcg) { struct seq_buf s; @@ -1363,6 +1389,9 @@ static char *memory_stat_format(struct mem_cgroup *memcg) * Provide statistics on the state of the memory subsystem as * well as cumulative event counters that show past behavior. * + * All values in this buffer are read individually, so no implied + * consistency amongst them. + * * This list is ordered following a combination of these gradients: * 1) generic big picture -> specifics and details * 2) reflecting userspace activity -> reflecting kernel heuristics @@ -1405,6 +1434,8 @@ static char *memory_stat_format(struct mem_cgroup *memcg) (u64)memcg_page_state(memcg, NR_ANON_THPS) * HPAGE_PMD_SIZE); #endif + seq_buf_printf(&s, "anon_reclaimable %llu\n", + (u64)memcg_anon_reclaimable(memcg) * PAGE_SIZE); for (i = 0; i < NR_LRU_LISTS; i++) seq_buf_printf(&s, "%s %llu\n", lru_list_name(i),
Userspace can lack insight into the amount of memory that can be reclaimed from a memcg based on values from memory.stat. Two specific examples: - Lazy freeable memory (MADV_FREE) that are clean anonymous pages on the inactive file LRU that can be quickly reclaimed under memory pressure but otherwise shows up as mapped anon in memory.stat, and - Memory on deferred split queues (thp) that are compound pages that can be split and uncharged from the memcg under memory pressure, but otherwise shows up as charged anon LRU memory in memory.stat. Both of this anonymous usage is also charged to memory.current. Userspace can currently derive this information but it depends on kernel implementation details for how this memory is handled for the purposes of reclaim (anon on inactive file LRU or unmapped anon on the LRU). For the purposes of writing portable userspace code that does not need to have insight into the kernel implementation for reclaimable memory, this exports a stat that reveals the amount of anonymous memory that can be reclaimed and uncharged from the memcg to start new applications. As the kernel implementation evolves for memory that can be reclaimed under memory pressure, this stat can be kept consistent. Signed-off-by: David Rientjes <rientjes@google.com> --- Documentation/admin-guide/cgroup-v2.rst | 6 +++++ mm/memcontrol.c | 31 +++++++++++++++++++++++++ 2 files changed, 37 insertions(+)