Message ID | 20190802223930.30971-2-mike.kravetz@oracle.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | address hugetlb page allocation stalls | expand |
On 8/3/19 12:39 AM, Mike Kravetz wrote: > From: Hillf Danton <hdanton@sina.com> > > Address the issue of should_continue_reclaim continuing true too often > for __GFP_RETRY_MAYFAIL attempts when !nr_reclaimed and nr_scanned. > This could happen during hugetlb page allocation causing stalls for > minutes or hours. > > We can stop reclaiming pages if compaction reports it can make a progress. > A code reshuffle is needed to do that. > And it has side-effects, however, > with allocation latencies in other cases but that would come at the cost > of potential premature reclaim which has consequences of itself. Based on Mel's longer explanation, can we clarify the wording here? e.g.: There might be side-effect for other high-order allocations that would potentially benefit from more reclaim before compaction for them to be faster and less likely to stall, but the consequences of premature/over-reclaim are considered worse. > We can also bail out of reclaiming pages if we know that there are not > enough inactive lru pages left to satisfy the costly allocation. > > We can give up reclaiming pages too if we see dryrun occur, with the > certainty of plenty of inactive pages. IOW with dryrun detected, we are > sure we have reclaimed as many pages as we could. > > Cc: Mike Kravetz <mike.kravetz@oracle.com> > Cc: Mel Gorman <mgorman@suse.de> > Cc: Michal Hocko <mhocko@kernel.org> > Cc: Vlastimil Babka <vbabka@suse.cz> > Cc: Johannes Weiner <hannes@cmpxchg.org> > Signed-off-by: Hillf Danton <hdanton@sina.com> > Tested-by: Mike Kravetz <mike.kravetz@oracle.com> > Acked-by: Mel Gorman <mgorman@suse.de> Acked-by: Vlastimil Babka <vbabka@suse.cz> I will send some followup cleanup. There should be also Mike's SOB? > --- > mm/vmscan.c | 28 +++++++++++++++------------- > 1 file changed, 15 insertions(+), 13 deletions(-) > > diff --git a/mm/vmscan.c b/mm/vmscan.c > index 47aa2158cfac..a386c5351592 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -2738,18 +2738,6 @@ static inline bool should_continue_reclaim(struct pglist_data *pgdat, > return false; > } > > - /* > - * If we have not reclaimed enough pages for compaction and the > - * inactive lists are large enough, continue reclaiming > - */ > - pages_for_compaction = compact_gap(sc->order); > - inactive_lru_pages = node_page_state(pgdat, NR_INACTIVE_FILE); > - if (get_nr_swap_pages() > 0) > - inactive_lru_pages += node_page_state(pgdat, NR_INACTIVE_ANON); > - if (sc->nr_reclaimed < pages_for_compaction && > - inactive_lru_pages > pages_for_compaction) > - return true; > - > /* If compaction would go ahead or the allocation would succeed, stop */ > for (z = 0; z <= sc->reclaim_idx; z++) { > struct zone *zone = &pgdat->node_zones[z]; > @@ -2765,7 +2753,21 @@ static inline bool should_continue_reclaim(struct pglist_data *pgdat, > ; > } > } > - return true; > + > + /* > + * If we have not reclaimed enough pages for compaction and the > + * inactive lists are large enough, continue reclaiming > + */ > + pages_for_compaction = compact_gap(sc->order); > + inactive_lru_pages = node_page_state(pgdat, NR_INACTIVE_FILE); > + if (get_nr_swap_pages() > 0) > + inactive_lru_pages += node_page_state(pgdat, NR_INACTIVE_ANON); > + > + return inactive_lru_pages > pages_for_compaction && > + /* > + * avoid dryrun with plenty of inactive pages > + */ > + nr_scanned && nr_reclaimed; > } > > static bool pgdat_memcg_congested(pg_data_t *pgdat, struct mem_cgroup *memcg) >
On 8/5/19 10:42 AM, Vlastimil Babka wrote: > On 8/3/19 12:39 AM, Mike Kravetz wrote: >> From: Hillf Danton <hdanton@sina.com> >> >> Address the issue of should_continue_reclaim continuing true too often >> for __GFP_RETRY_MAYFAIL attempts when !nr_reclaimed and nr_scanned. >> This could happen during hugetlb page allocation causing stalls for >> minutes or hours. >> >> We can stop reclaiming pages if compaction reports it can make a progress. >> A code reshuffle is needed to do that. > >> And it has side-effects, however, >> with allocation latencies in other cases but that would come at the cost >> of potential premature reclaim which has consequences of itself. > > Based on Mel's longer explanation, can we clarify the wording here? e.g.: > > There might be side-effect for other high-order allocations that would > potentially benefit from more reclaim before compaction for them to be > faster and less likely to stall, but the consequences of > premature/over-reclaim are considered worse. > >> We can also bail out of reclaiming pages if we know that there are not >> enough inactive lru pages left to satisfy the costly allocation. >> >> We can give up reclaiming pages too if we see dryrun occur, with the >> certainty of plenty of inactive pages. IOW with dryrun detected, we are >> sure we have reclaimed as many pages as we could. >> >> Cc: Mike Kravetz <mike.kravetz@oracle.com> >> Cc: Mel Gorman <mgorman@suse.de> >> Cc: Michal Hocko <mhocko@kernel.org> >> Cc: Vlastimil Babka <vbabka@suse.cz> >> Cc: Johannes Weiner <hannes@cmpxchg.org> >> Signed-off-by: Hillf Danton <hdanton@sina.com> >> Tested-by: Mike Kravetz <mike.kravetz@oracle.com> >> Acked-by: Mel Gorman <mgorman@suse.de> > > Acked-by: Vlastimil Babka <vbabka@suse.cz> > I will send some followup cleanup. How about this? ----8<---- From 0040b32462587171ad22395a56699cc036ad483f Mon Sep 17 00:00:00 2001 From: Vlastimil Babka <vbabka@suse.cz> Date: Mon, 5 Aug 2019 12:49:40 +0200 Subject: [PATCH] mm, reclaim: cleanup should_continue_reclaim() After commit "mm, reclaim: make should_continue_reclaim perform dryrun detection", closer look at the function shows, that nr_reclaimed == 0 means the function will always return false. And since non-zero nr_reclaimed implies non_zero nr_scanned, testing nr_scanned serves no purpose, and so does the testing for __GFP_RETRY_MAYFAIL. This patch thus cleans up the function to test only !nr_reclaimed upfront, and remove the __GFP_RETRY_MAYFAIL test and nr_scanned parameter completely. Comment is also updated, explaining that approximating "full LRU list has been scanned" with nr_scanned == 0 didn't really work. Signed-off-by: Vlastimil Babka <vbabka@suse.cz> --- mm/vmscan.c | 43 ++++++++++++++----------------------------- 1 file changed, 14 insertions(+), 29 deletions(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index ad498b76e492..db3c9e06a888 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -2582,7 +2582,6 @@ static bool in_reclaim_compaction(struct scan_control *sc) */ static inline bool should_continue_reclaim(struct pglist_data *pgdat, unsigned long nr_reclaimed, - unsigned long nr_scanned, struct scan_control *sc) { unsigned long pages_for_compaction; @@ -2593,28 +2592,18 @@ static inline bool should_continue_reclaim(struct pglist_data *pgdat, if (!in_reclaim_compaction(sc)) return false; - /* Consider stopping depending on scan and reclaim activity */ - if (sc->gfp_mask & __GFP_RETRY_MAYFAIL) { - /* - * For __GFP_RETRY_MAYFAIL allocations, stop reclaiming if the - * full LRU list has been scanned and we are still failing - * to reclaim pages. This full LRU scan is potentially - * expensive but a __GFP_RETRY_MAYFAIL caller really wants to succeed - */ - if (!nr_reclaimed && !nr_scanned) - return false; - } else { - /* - * For non-__GFP_RETRY_MAYFAIL allocations which can presumably - * fail without consequence, stop if we failed to reclaim - * any pages from the last SWAP_CLUSTER_MAX number of - * pages that were scanned. This will return to the - * caller faster at the risk reclaim/compaction and - * the resulting allocation attempt fails - */ - if (!nr_reclaimed) - return false; - } + /* + * Stop if we failed to reclaim any pages from the last SWAP_CLUSTER_MAX + * number of pages that were scanned. This will return to the caller + * with the risk reclaim/compaction and the resulting allocation attempt + * fails. In the past we have tried harder for __GFP_RETRY_MAYFAIL + * allocations through requiring that the full LRU list has been scanned + * first, by assuming that zero delta of sc->nr_scanned means full LRU + * scan, but that approximation was wrong, and there were corner cases + * where always a non-zero amount of pages were scanned. + */ + if (!nr_reclaimed) + return false; /* If compaction would go ahead or the allocation would succeed, stop */ for (z = 0; z <= sc->reclaim_idx; z++) { @@ -2641,11 +2630,7 @@ static inline bool should_continue_reclaim(struct pglist_data *pgdat, if (get_nr_swap_pages() > 0) inactive_lru_pages += node_page_state(pgdat, NR_INACTIVE_ANON); - return inactive_lru_pages > pages_for_compaction && - /* - * avoid dryrun with plenty of inactive pages - */ - nr_scanned && nr_reclaimed; + return inactive_lru_pages > pages_for_compaction; } static bool pgdat_memcg_congested(pg_data_t *pgdat, struct mem_cgroup *memcg) @@ -2810,7 +2795,7 @@ static bool shrink_node(pg_data_t *pgdat, struct scan_control *sc) wait_iff_congested(BLK_RW_ASYNC, HZ/10); } while (should_continue_reclaim(pgdat, sc->nr_reclaimed - nr_reclaimed, - sc->nr_scanned - nr_scanned, sc)); + sc)); /* * Kswapd gives up on balancing particular nodes after too
On 8/5/19 1:42 AM, Vlastimil Babka wrote: > On 8/3/19 12:39 AM, Mike Kravetz wrote: >> From: Hillf Danton <hdanton@sina.com> >> >> Address the issue of should_continue_reclaim continuing true too often >> for __GFP_RETRY_MAYFAIL attempts when !nr_reclaimed and nr_scanned. >> This could happen during hugetlb page allocation causing stalls for >> minutes or hours. >> >> We can stop reclaiming pages if compaction reports it can make a progress. >> A code reshuffle is needed to do that. > >> And it has side-effects, however, >> with allocation latencies in other cases but that would come at the cost >> of potential premature reclaim which has consequences of itself. > > Based on Mel's longer explanation, can we clarify the wording here? e.g.: > > There might be side-effect for other high-order allocations that would > potentially benefit from more reclaim before compaction for them to be > faster and less likely to stall, but the consequences of > premature/over-reclaim are considered worse. > >> We can also bail out of reclaiming pages if we know that there are not >> enough inactive lru pages left to satisfy the costly allocation. >> >> We can give up reclaiming pages too if we see dryrun occur, with the >> certainty of plenty of inactive pages. IOW with dryrun detected, we are >> sure we have reclaimed as many pages as we could. >> >> Cc: Mike Kravetz <mike.kravetz@oracle.com> >> Cc: Mel Gorman <mgorman@suse.de> >> Cc: Michal Hocko <mhocko@kernel.org> >> Cc: Vlastimil Babka <vbabka@suse.cz> >> Cc: Johannes Weiner <hannes@cmpxchg.org> >> Signed-off-by: Hillf Danton <hdanton@sina.com> >> Tested-by: Mike Kravetz <mike.kravetz@oracle.com> >> Acked-by: Mel Gorman <mgorman@suse.de> > > Acked-by: Vlastimil Babka <vbabka@suse.cz> > I will send some followup cleanup. > > There should be also Mike's SOB? Will do. My apologies, the process of handling patches created by others is new to me. Also, will incorporate Mel's explanation.
On 8/5/19 3:57 AM, Vlastimil Babka wrote: > On 8/5/19 10:42 AM, Vlastimil Babka wrote: >> On 8/3/19 12:39 AM, Mike Kravetz wrote: >>> From: Hillf Danton <hdanton@sina.com> >>> >>> Address the issue of should_continue_reclaim continuing true too often >>> for __GFP_RETRY_MAYFAIL attempts when !nr_reclaimed and nr_scanned. >>> This could happen during hugetlb page allocation causing stalls for >>> minutes or hours. >>> >>> We can stop reclaiming pages if compaction reports it can make a progress. >>> A code reshuffle is needed to do that. >> >>> And it has side-effects, however, >>> with allocation latencies in other cases but that would come at the cost >>> of potential premature reclaim which has consequences of itself. >> >> Based on Mel's longer explanation, can we clarify the wording here? e.g.: >> >> There might be side-effect for other high-order allocations that would >> potentially benefit from more reclaim before compaction for them to be >> faster and less likely to stall, but the consequences of >> premature/over-reclaim are considered worse. >> >>> We can also bail out of reclaiming pages if we know that there are not >>> enough inactive lru pages left to satisfy the costly allocation. >>> >>> We can give up reclaiming pages too if we see dryrun occur, with the >>> certainty of plenty of inactive pages. IOW with dryrun detected, we are >>> sure we have reclaimed as many pages as we could. >>> >>> Cc: Mike Kravetz <mike.kravetz@oracle.com> >>> Cc: Mel Gorman <mgorman@suse.de> >>> Cc: Michal Hocko <mhocko@kernel.org> >>> Cc: Vlastimil Babka <vbabka@suse.cz> >>> Cc: Johannes Weiner <hannes@cmpxchg.org> >>> Signed-off-by: Hillf Danton <hdanton@sina.com> >>> Tested-by: Mike Kravetz <mike.kravetz@oracle.com> >>> Acked-by: Mel Gorman <mgorman@suse.de> >> >> Acked-by: Vlastimil Babka <vbabka@suse.cz> >> I will send some followup cleanup. > > How about this? > ----8<---- > From 0040b32462587171ad22395a56699cc036ad483f Mon Sep 17 00:00:00 2001 > From: Vlastimil Babka <vbabka@suse.cz> > Date: Mon, 5 Aug 2019 12:49:40 +0200 > Subject: [PATCH] mm, reclaim: cleanup should_continue_reclaim() > > After commit "mm, reclaim: make should_continue_reclaim perform dryrun > detection", closer look at the function shows, that nr_reclaimed == 0 means > the function will always return false. And since non-zero nr_reclaimed implies > non_zero nr_scanned, testing nr_scanned serves no purpose, and so does the > testing for __GFP_RETRY_MAYFAIL. > > This patch thus cleans up the function to test only !nr_reclaimed upfront, and > remove the __GFP_RETRY_MAYFAIL test and nr_scanned parameter completely. > Comment is also updated, explaining that approximating "full LRU list has been > scanned" with nr_scanned == 0 didn't really work. > > Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Mike Kravetz <mike.kravetz@oracle.com> Would you like me to add this to the series, or do you want to send later?
On 8/5/19 6:58 PM, Mike Kravetz wrote: >> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> > > Acked-by: Mike Kravetz <mike.kravetz@oracle.com> > > Would you like me to add this to the series, or do you want to send later? Please add, thanks!
diff --git a/mm/vmscan.c b/mm/vmscan.c index 47aa2158cfac..a386c5351592 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -2738,18 +2738,6 @@ static inline bool should_continue_reclaim(struct pglist_data *pgdat, return false; } - /* - * If we have not reclaimed enough pages for compaction and the - * inactive lists are large enough, continue reclaiming - */ - pages_for_compaction = compact_gap(sc->order); - inactive_lru_pages = node_page_state(pgdat, NR_INACTIVE_FILE); - if (get_nr_swap_pages() > 0) - inactive_lru_pages += node_page_state(pgdat, NR_INACTIVE_ANON); - if (sc->nr_reclaimed < pages_for_compaction && - inactive_lru_pages > pages_for_compaction) - return true; - /* If compaction would go ahead or the allocation would succeed, stop */ for (z = 0; z <= sc->reclaim_idx; z++) { struct zone *zone = &pgdat->node_zones[z]; @@ -2765,7 +2753,21 @@ static inline bool should_continue_reclaim(struct pglist_data *pgdat, ; } } - return true; + + /* + * If we have not reclaimed enough pages for compaction and the + * inactive lists are large enough, continue reclaiming + */ + pages_for_compaction = compact_gap(sc->order); + inactive_lru_pages = node_page_state(pgdat, NR_INACTIVE_FILE); + if (get_nr_swap_pages() > 0) + inactive_lru_pages += node_page_state(pgdat, NR_INACTIVE_ANON); + + return inactive_lru_pages > pages_for_compaction && + /* + * avoid dryrun with plenty of inactive pages + */ + nr_scanned && nr_reclaimed; } static bool pgdat_memcg_congested(pg_data_t *pgdat, struct mem_cgroup *memcg)