Message ID | 20210304235949.7922C1C3@viggo.jf.intel.com (mailing list archive) |
---|---|
Headers | show |
Series | Migrate Pages in lieu of discard | expand |
On Thu, Mar 4, 2021 at 4:00 PM Dave Hansen <dave.hansen@linux.intel.com> wrote: > > > The full series is also available here: > > https://github.com/hansendc/linux/tree/automigrate-20210304 > > which also inclues some vm.zone_reclaim_mode sysctl ABI fixup > prerequisites. > > The meat of this patch is in: > > [PATCH 05/10] mm/migrate: demote pages during reclaim > > Which also has the most changes since the last post. This version is > mostly to address review comments from Yang Shi and Oscar Salvador. > Review comments are documented in the individual patch changelogs. > > This also contains a few prerequisite patches that fix up an issue > with the vm.zone_reclaim_mode sysctl ABI. > > Changes since (automigrate-20210122): > * move from GFP_HIGHUSER -> GFP_HIGHUSER_MOVABLE since pages *are* > movable. > * Separate out helpers that check for being able to relaim anonymous > pages versus being able to meaningfully scan the anon LRU. > > -- > > We're starting to see systems with more and more kinds of memory such > as Intel's implementation of persistent memory. > > Let's say you have a system with some DRAM and some persistent memory. > Today, once DRAM fills up, reclaim will start and some of the DRAM > contents will be thrown out. Allocations will, at some point, start > falling over to the slower persistent memory. > > That has two nasty properties. First, the newer allocations can end > up in the slower persistent memory. Second, reclaimed data in DRAM > are just discarded even if there are gobs of space in persistent > memory that could be used. > > This set implements a solution to these problems. At the end of the > reclaim process in shrink_page_list() just before the last page > refcount is dropped, the page is migrated to persistent memory instead > of being dropped. > > While I've talked about a DRAM/PMEM pairing, this approach would > function in any environment where memory tiers exist. > > This is not perfect. It "strands" pages in slower memory and never > brings them back to fast DRAM. Other things need to be built to > promote hot pages back to DRAM. > > This is also all based on an upstream mechanism that allows > persistent memory to be onlined and used as if it were volatile: > > http://lkml.kernel.org/r/20190124231441.37A4A305@viggo.jf.intel.com > > == Open Issues == > > * For cpusets and memory policies that restrict allocations > to PMEM, is it OK to demote to PMEM? Do we need a cgroup- > level API to opt-in or opt-out of these migrations? I'm wondering if such usecases, which don't want to have memory allocate on pmem, will allow memory swapped out or reclaimed? If swap is allowed then I failed to see why migrating to pmem should be disallowed. If swap is not allowed, they should call mlock, then the memory won't be migrated to pmem as well. > * Could be more aggressive about where anon LRU scanning occurs > since it no longer necessarily involves I/O. get_scan_count() > for instance says: "If we have no swap space, do not bother > scanning anon pages" Yes, I agree. Johannes's patchset (https://lore.kernel.org/linux-mm/20200520232525.798933-1-hannes@cmpxchg.org/#r) has lifted the swappiness to 200 so anonymous lru could be scanned more aggressively. We definitely could tweak this if needed. > > -- > > Documentation/admin-guide/sysctl/vm.rst | 9 > include/linux/migrate.h | 20 + > include/linux/swap.h | 3 > include/linux/vm_event_item.h | 2 > include/trace/events/migrate.h | 3 > include/uapi/linux/mempolicy.h | 1 > mm/compaction.c | 3 > mm/gup.c | 4 > mm/internal.h | 5 > mm/memory-failure.c | 4 > mm/memory_hotplug.c | 4 > mm/mempolicy.c | 8 > mm/migrate.c | 369 +++++++++++++++++++++++++++++--- > mm/page_alloc.c | 13 - > mm/vmscan.c | 173 +++++++++++++-- > mm/vmstat.c | 2 > 16 files changed, 560 insertions(+), 63 deletions(-) > > -- > > Changes since (automigrate-20200818): > * Fall back to normal reclaim when demotion fails > * Fix some compile issues, when page migration and NUMA are off > > Changes since (automigrate-20201007): > * separate out checks for "can scan anon LRU" from "can actually > swap anon pages right now". Previous series conflated them > and may have been overly aggressive scanning LRU > * add MR_DEMOTION to tracepoint header > * remove unnecessary hugetlb page check > > Changes since (https://lwn.net/Articles/824830/): > * Use higher-level migrate_pages() API approach from Yang Shi's > earlier patches. > * made sure to actually check node_reclaim_mode's new bit > * disabled migration entirely before introducing RECLAIM_MIGRATE > * Replace GFP_NOWAIT with explicit __GFP_KSWAPD_RECLAIM and > comment why we want that. > * Comment on effects of that keep multiple source nodes from > sharing target nodes > > Cc: Yang Shi <yang.shi@linux.alibaba.com> > Cc: David Rientjes <rientjes@google.com> > Cc: Huang Ying <ying.huang@intel.com> > Cc: Dan Williams <dan.j.williams@intel.com> > Cc: David Hildenbrand <david@redhat.com> > Cc: osalvador <osalvador@suse.de> > Cc: Huang Ying <ying.huang@intel.com> > >
... >> == Open Issues == >> >> * For cpusets and memory policies that restrict allocations >> to PMEM, is it OK to demote to PMEM? Do we need a cgroup- >> level API to opt-in or opt-out of these migrations? > > I'm wondering if such usecases, which don't want to have memory > allocate on pmem, will allow memory swapped out or reclaimed? If swap > is allowed then I failed to see why migrating to pmem should be > disallowed. If swap is not allowed, they should call mlock, then the > memory won't be migrated to pmem as well. Agreed. I have a hard time imagining there are a lot of folks that can tolerate the massive overhead from swapping, but can't tolerate the much smaller overhead of going to pmem instead of DRAM.