Message ID | 20201007161749.4C56D1F1@viggo.jf.intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Migrate Pages in lieu of discard | expand |
On Wed, Oct 07, 2020 at 09:17:49AM -0700, Dave Hansen wrote: > > From: Keith Busch <kbusch@kernel.org> > > Age and reclaim anonymous pages if a migration path is available. The > node has other recourses for inactive anonymous pages beyond swap, > > #Signed-off-by: Keith Busch <keith.busch@intel.com> > Cc: Keith Busch <kbusch@kernel.org> > [vishal: fixup the migration->demotion rename] > Signed-off-by: Vishal Verma <vishal.l.verma@intel.com> > Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> > Cc: Yang Shi <yang.shi@linux.alibaba.com> > Cc: David Rientjes <rientjes@google.com> > Cc: Huang Ying <ying.huang@intel.com> > Cc: Dan Williams <dan.j.williams@intel.com> > Cc: David Hildenbrand <david@redhat.com> I have a question regarding this one. It seems that we do have places where we read total_swap_pages directly and other places where we use get_nr_swap_pages. One seems to give the total number of swap pages, while the other gives the number of free swap pages. With this patch, we will use always the atomic version get_nr_swap_pages from now on. Is that ok? I guess so, but it might warrant a mention in the changelog? E.g: age_active_anon seems to base one of its decisions on whether we have swap (it seems it does not care if swap space is available).
On 10/29/20 1:14 AM, Oscar Salvador wrote: > With this patch, we will use always the atomic version > get_nr_swap_pages from now on. Is that ok? I guess so, but it might > warrant a mention in the changelog? I _think_ it's OK. But, you're right that it's a potential behavior change that's not mentioned in the changelog. I'll mention it in the changelog and see if I can dream up any other practical implications from this change. Thanks for taking a look!
On Thu, Oct 29, 2020 at 7:33 AM Dave Hansen <dave.hansen@intel.com> wrote: > > On 10/29/20 1:14 AM, Oscar Salvador wrote: > > With this patch, we will use always the atomic version > > get_nr_swap_pages from now on. Is that ok? I guess so, but it might > > warrant a mention in the changelog? > > I _think_ it's OK. But, you're right that it's a potential behavior > change that's not mentioned in the changelog. > > I'll mention it in the changelog and see if I can dream up any other > practical implications from this change. IMHO, we don't have to modify those two places at all. They are used to rebalance the anon lru active/inactive ratio even if we did not try to evict anon pages at all, so "total_swap_pages" is used instead of checking swappiness and available swap space. The changes may result in imbalanced anon lru. > > Thanks for taking a look! >
On Thu, Oct 29, 2020 at 08:57:32AM -0700, Yang Shi wrote: > IMHO, we don't have to modify those two places at all. They are used > to rebalance the anon lru active/inactive ratio even if we did not try > to evict anon pages at all, so "total_swap_pages" is used instead of > checking swappiness and available swap space. > > The changes may result in imbalanced anon lru. I might be missing something, so bear with me. It is true that since we are only rebalancing the lists, we do not need to check for swap space yet, but here we are also adding a new end-point where we can migrate to in case of memory pressure. So in case we can demote pages, it makes sense to proceed with the aging and rebalancing regardless of whether we have swap in place, right? But maybe the right procedure would be to perform some sort of the following check in those two places: if (total_swap_pages || can_migrate_to_demote_node) - proceed_with_rebalancing_or_aging -- Oscar Salvador SUSE L3
On Thu, Oct 29, 2020 at 12:08 PM osalvador <osalvador@suse.de> wrote: > > On Thu, Oct 29, 2020 at 08:57:32AM -0700, Yang Shi wrote: > > IMHO, we don't have to modify those two places at all. They are used > > to rebalance the anon lru active/inactive ratio even if we did not try > > to evict anon pages at all, so "total_swap_pages" is used instead of > > checking swappiness and available swap space. > > > > The changes may result in imbalanced anon lru. > > I might be missing something, so bear with me. > > It is true that since we are only rebalancing the lists, we do not need to > check for swap space yet, but here we are also adding a new end-point where we > can migrate to in case of memory pressure. > > So in case we can demote pages, it makes sense to proceed with the aging > and rebalancing regardless of whether we have swap in place, right? Yes, makes sense. I missed that point. > > But maybe the right procedure would be to perform some sort of the > following check in those two places: > > if (total_swap_pages || can_migrate_to_demote_node) > - proceed_with_rebalancing_or_aging Looks sane to me. > > -- > Oscar Salvador > SUSE L3
diff -puN include/linux/node.h~0009-mm-vmscan-Consider-anonymous-pages-without-swap include/linux/node.h --- a/include/linux/node.h~0009-mm-vmscan-Consider-anonymous-pages-without-swap 2020-10-07 09:15:33.390642436 -0700 +++ b/include/linux/node.h 2020-10-07 09:15:33.399642436 -0700 @@ -180,4 +180,13 @@ static inline void register_hugetlbfs_wi #define to_node(device) container_of(device, struct node, dev) +#ifdef CONFIG_MIGRATION +extern int next_demotion_node(int node); +#else +static inline int next_demotion_node(int node) +{ + return NUMA_NO_NODE; +} +#endif + #endif /* _LINUX_NODE_H_ */ diff -puN mm/vmscan.c~0009-mm-vmscan-Consider-anonymous-pages-without-swap mm/vmscan.c --- a/mm/vmscan.c~0009-mm-vmscan-Consider-anonymous-pages-without-swap 2020-10-07 09:15:33.392642436 -0700 +++ b/mm/vmscan.c 2020-10-07 09:15:33.400642436 -0700 @@ -290,6 +290,26 @@ static bool writeback_throttling_sane(st } #endif +static inline bool can_reclaim_anon_pages(struct mem_cgroup *memcg, + int node_id) +{ + /* Always age anon pages when we have swap */ + if (memcg == NULL) { + if (get_nr_swap_pages() > 0) + return true; + } else { + if (mem_cgroup_get_nr_swap_pages(memcg) > 0) + return true; + } + + /* Also age anon pages if we can auto-migrate them */ + if (next_demotion_node(node_id) >= 0) + return true; + + /* No way to reclaim anon pages */ + return false; +} + /* * This misses isolated pages which are not accounted for to save counters. * As the data only determines if reclaim or compaction continues, it is @@ -301,7 +321,7 @@ unsigned long zone_reclaimable_pages(str nr = zone_page_state_snapshot(zone, NR_ZONE_INACTIVE_FILE) + zone_page_state_snapshot(zone, NR_ZONE_ACTIVE_FILE); - if (get_nr_swap_pages() > 0) + if (can_reclaim_anon_pages(NULL, zone_to_nid(zone))) nr += zone_page_state_snapshot(zone, NR_ZONE_INACTIVE_ANON) + zone_page_state_snapshot(zone, NR_ZONE_ACTIVE_ANON); @@ -2337,6 +2357,7 @@ enum scan_balance { static void get_scan_count(struct lruvec *lruvec, struct scan_control *sc, unsigned long *nr) { + struct pglist_data *pgdat = lruvec_pgdat(lruvec); struct mem_cgroup *memcg = lruvec_memcg(lruvec); unsigned long anon_cost, file_cost, total_cost; int swappiness = mem_cgroup_swappiness(memcg); @@ -2347,7 +2368,7 @@ static void get_scan_count(struct lruvec enum lru_list lru; /* If we have no swap space, do not bother scanning anon pages. */ - if (!sc->may_swap || mem_cgroup_get_nr_swap_pages(memcg) <= 0) { + if (!sc->may_swap || !can_reclaim_anon_pages(memcg, pgdat->node_id)) { scan_balance = SCAN_FILE; goto out; } @@ -2631,7 +2652,9 @@ static void shrink_lruvec(struct lruvec * Even if we did not try to evict anon pages at all, we want to * rebalance the anon lru active/inactive ratio. */ - if (total_swap_pages && inactive_is_low(lruvec, LRU_INACTIVE_ANON)) + if (can_reclaim_anon_pages(lruvec_memcg(lruvec), + lruvec_pgdat(lruvec)->node_id) && + inactive_is_low(lruvec, LRU_INACTIVE_ANON)) shrink_active_list(SWAP_CLUSTER_MAX, lruvec, sc, LRU_ACTIVE_ANON); } @@ -2701,7 +2724,7 @@ static inline bool should_continue_recla */ pages_for_compaction = compact_gap(sc->order); inactive_lru_pages = node_page_state(pgdat, NR_INACTIVE_FILE); - if (get_nr_swap_pages() > 0) + if (can_reclaim_anon_pages(NULL, pgdat->node_id)) inactive_lru_pages += node_page_state(pgdat, NR_INACTIVE_ANON); return inactive_lru_pages > pages_for_compaction; @@ -3460,7 +3483,7 @@ static void age_active_anon(struct pglis struct mem_cgroup *memcg; struct lruvec *lruvec; - if (!total_swap_pages) + if (!can_reclaim_anon_pages(NULL, pgdat->node_id)) return; lruvec = mem_cgroup_lruvec(NULL, pgdat);