Message ID | 20190724175014.9935-2-mike.kravetz@oracle.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | fix hugetlb page allocation stalls | expand |
On Wed, Jul 24, 2019 at 10:50:12AM -0700, Mike Kravetz wrote: > From: Hillf Danton <hdanton@sina.com> > > Address the issue of should_continue_reclaim continuing true too often > for __GFP_RETRY_MAYFAIL attempts when !nr_reclaimed and nr_scanned. > This could happen during hugetlb page allocation causing stalls for > minutes or hours. > > Restructure code so that false will be returned in this case even if > there are plenty of inactive pages. > > Need better description from Hillf Danton > Need SOB from Hillf Danton Agreed that the description could do with improvement. However, it makes sense that if compaction reports it can make progress that it is unnecessary to continue reclaiming. There might be side-effects with allocation latencies in other cases but that would come at the cost of potential premature reclaim which has consequences of itself. Overall I think it's an improvement so I'll ack it once it has a S-O-B.
> From: Hillf Danton <hdanton@sina.com> > Subject: [RFC PATCH 1/3] mm, reclaim: make should_continue_reclaim perform dryrun detection > > Address the issue of should_continue_reclaim continuing true too often > for __GFP_RETRY_MAYFAIL attempts when !nr_reclaimed and nr_scanned. > This could happen during hugetlb page allocation causing stalls for > minutes or hours. > > We can stop reclaiming pages if compaction reports it can make a progress. > A code reshuffle is needed to do that. And it has side-effects, however, > with allocation latencies in other cases but that would come at the cost > of potential premature reclaim which has consequences of itself. > > We can also bail out of reclaiming pages if we know that there are not > enough inactive lru pages left to satisfy the costly allocation. > > We can give up reclaiming pages too if we see dryrun occur, with the > certainty of plenty of inactive pages. IOW with dryrun detected, we are > sure we have reclaimed as many pages as we could. > > Cc: Mike Kravetz <mike.kravetz@oracle.com> > Cc: Mel Gorman <mgorman@suse.de> > Cc: Michal Hocko <mhocko@kernel.org> > Cc: Vlastimil Babka <vbabka@suse.cz> > Cc: Johannes Weiner <hannes@cmpxchg.org> > Signed-off-by: Hillf Danton <hdanton@sina.com> Acked-by: Mel Gorman <mgorman@suse.de> Thanks Hillf
On 7/26/19 9:40 AM, Hillf Danton wrote: > > On Thu, 25 Jul 2019 08:05:55 +0000 (UTC) Mel Gorman wrote: >> >> Agreed that the description could do with improvement. However, it >> makes sense that if compaction reports it can make progress that it is >> unnecessary to continue reclaiming. > > Thanks Mike and Mel. > > Hillf > ---8<--- > From: Hillf Danton <hdanton@sina.com> > Subject: [RFC PATCH 1/3] mm, reclaim: make should_continue_reclaim perform dryrun detection > > Address the issue of should_continue_reclaim continuing true too often > for __GFP_RETRY_MAYFAIL attempts when !nr_reclaimed and nr_scanned. > This could happen during hugetlb page allocation causing stalls for > minutes or hours. > > We can stop reclaiming pages if compaction reports it can make a progress. > A code reshuffle is needed to do that. And it has side-effects, however, > with allocation latencies in other cases but that would come at the cost > of potential premature reclaim which has consequences of itself. I don't really understand that paragraph, did Mel meant it like this? > We can also bail out of reclaiming pages if we know that there are not > enough inactive lru pages left to satisfy the costly allocation. > > We can give up reclaiming pages too if we see dryrun occur, with the > certainty of plenty of inactive pages. IOW with dryrun detected, we are > sure we have reclaimed as many pages as we could. > > Cc: Mike Kravetz <mike.kravetz@oracle.com> > Cc: Mel Gorman <mgorman@suse.de> > Cc: Michal Hocko <mhocko@kernel.org> > Cc: Vlastimil Babka <vbabka@suse.cz> > Cc: Johannes Weiner <hannes@cmpxchg.org> > Signed-off-by: Hillf Danton <hdanton@sina.com> I agree this is an improvement overall, but perhaps the patch does too many things at once. The reshuffle is one thing and makes sense. The change of the last return condition could perhaps be separate. Also AFAICS the ultimate result is that when nr_reclaimed == 0, the function will now always return false. Which makes the initial test for __GFP_RETRY_MAYFAIL and the comments there misleading. There will no longer be a full LRU scan guaranteed - as long as the scanned LRU chunk yields no reclaimed page, we abort. > --- > mm/vmscan.c | 28 +++++++++++++++------------- > 1 file changed, 15 insertions(+), 13 deletions(-) > > diff --git a/mm/vmscan.c b/mm/vmscan.c > index f4fd02a..484b6b1 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -2673,18 +2673,6 @@ static inline bool should_continue_reclaim(struct pglist_data *pgdat, > return false; > } > > - /* > - * If we have not reclaimed enough pages for compaction and the > - * inactive lists are large enough, continue reclaiming > - */ > - pages_for_compaction = compact_gap(sc->order); > - inactive_lru_pages = node_page_state(pgdat, NR_INACTIVE_FILE); > - if (get_nr_swap_pages() > 0) > - inactive_lru_pages += node_page_state(pgdat, NR_INACTIVE_ANON); > - if (sc->nr_reclaimed < pages_for_compaction && > - inactive_lru_pages > pages_for_compaction) > - return true; > - > /* If compaction would go ahead or the allocation would succeed, stop */ > for (z = 0; z <= sc->reclaim_idx; z++) { > struct zone *zone = &pgdat->node_zones[z]; > @@ -2700,7 +2688,21 @@ static inline bool should_continue_reclaim(struct pglist_data *pgdat, > ; > } > } > - return true; > + > + /* > + * If we have not reclaimed enough pages for compaction and the > + * inactive lists are large enough, continue reclaiming > + */ > + pages_for_compaction = compact_gap(sc->order); > + inactive_lru_pages = node_page_state(pgdat, NR_INACTIVE_FILE); > + if (get_nr_swap_pages() > 0) > + inactive_lru_pages += node_page_state(pgdat, NR_INACTIVE_ANON); > + > + return inactive_lru_pages > pages_for_compaction && > + /* > + * avoid dryrun with plenty of inactive pages > + */ > + nr_scanned && nr_reclaimed; > } > > static bool pgdat_memcg_congested(pg_data_t *pgdat, struct mem_cgroup *memcg) > -- >
On Wed, Jul 31, 2019 at 01:08:44PM +0200, Vlastimil Babka wrote: > On 7/26/19 9:40 AM, Hillf Danton wrote: > > > > On Thu, 25 Jul 2019 08:05:55 +0000 (UTC) Mel Gorman wrote: > >> > >> Agreed that the description could do with improvement. However, it > >> makes sense that if compaction reports it can make progress that it is > >> unnecessary to continue reclaiming. > > > > Thanks Mike and Mel. > > > > Hillf > > ---8<--- > > From: Hillf Danton <hdanton@sina.com> > > Subject: [RFC PATCH 1/3] mm, reclaim: make should_continue_reclaim perform dryrun detection > > > > Address the issue of should_continue_reclaim continuing true too often > > for __GFP_RETRY_MAYFAIL attempts when !nr_reclaimed and nr_scanned. > > This could happen during hugetlb page allocation causing stalls for > > minutes or hours. > > > > We can stop reclaiming pages if compaction reports it can make a progress. > > A code reshuffle is needed to do that. And it has side-effects, however, > > with allocation latencies in other cases but that would come at the cost > > of potential premature reclaim which has consequences of itself. > > I don't really understand that paragraph, did Mel meant it like this? > Fundamentally, the balancing act is between a) reclaiming more now so that compaction is more likely to succeed or b) keep pages resident to avoid refaulting. With a) high order allocations are faster, less likely to stall and more likely to succeed. However, it can also prematurely reclaim pages and free more memory than is necessary for compaction to succeed in a reasonable amount of time. We also know from testing that it can hit corner cases with hugetlbfs where stalls happen for prolonged periods of time anyway and the series overall is known to fix those stalls. > > Cc: Vlastimil Babka <vbabka@suse.cz> > > Cc: Johannes Weiner <hannes@cmpxchg.org> > > Signed-off-by: Hillf Danton <hdanton@sina.com> > > I agree this is an improvement overall, but perhaps the patch does too > many things at once. The reshuffle is one thing and makes sense. The > change of the last return condition could perhaps be separate. Also > AFAICS the ultimate result is that when nr_reclaimed == 0, the function > will now always return false. Which makes the initial test for > __GFP_RETRY_MAYFAIL and the comments there misleading. There will no > longer be a full LRU scan guaranteed - as long as the scanned LRU chunk > yields no reclaimed page, we abort. > I've no strong feelings on whether it is worth splitting the patch. In my mind it's more or less doing one thing even though the one thing is a relatively high-level problem.
On 7/31/19 4:08 AM, Vlastimil Babka wrote: > > I agree this is an improvement overall, but perhaps the patch does too > many things at once. The reshuffle is one thing and makes sense. The > change of the last return condition could perhaps be separate. Also > AFAICS the ultimate result is that when nr_reclaimed == 0, the function > will now always return false. Which makes the initial test for > __GFP_RETRY_MAYFAIL and the comments there misleading. There will no > longer be a full LRU scan guaranteed - as long as the scanned LRU chunk > yields no reclaimed page, we abort. Can someone help me understand why nr_scanned == 0 guarantees a full LRU scan? FWICS, nr_scanned used in this context is only incremented in shrink_page_list and potentially shrink_zones. In the stall case I am looking at, there are MANY cases in which nr_scanned is only a few pages and none of those are reclaimed. Can we not get nr_scanned == 0 on an arbitrary chunk of the LRU? I must be missing something, because I do not see how nr_scanned == 0 guarantees a full scan.
On 7/31/19 11:11 PM, Mike Kravetz wrote: > On 7/31/19 4:08 AM, Vlastimil Babka wrote: >> >> I agree this is an improvement overall, but perhaps the patch does too >> many things at once. The reshuffle is one thing and makes sense. The >> change of the last return condition could perhaps be separate. Also >> AFAICS the ultimate result is that when nr_reclaimed == 0, the function >> will now always return false. Which makes the initial test for >> __GFP_RETRY_MAYFAIL and the comments there misleading. There will no >> longer be a full LRU scan guaranteed - as long as the scanned LRU chunk >> yields no reclaimed page, we abort. > > Can someone help me understand why nr_scanned == 0 guarantees a full > LRU scan? FWICS, nr_scanned used in this context is only incremented > in shrink_page_list and potentially shrink_zones. In the stall case I > am looking at, there are MANY cases in which nr_scanned is only a few > pages and none of those are reclaimed. > > Can we not get nr_scanned == 0 on an arbitrary chunk of the LRU? > > I must be missing something, because I do not see how nr_scanned == 0 > guarantees a full scan. Yeah, seems like it doesn't. More reasons to update/remove the comment. Can be a followup cleanup if you don't want to block the series.
diff --git a/mm/vmscan.c b/mm/vmscan.c index f4fd02ae233e..484b6b1a954e 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -2673,18 +2673,6 @@ static inline bool should_continue_reclaim(struct pglist_data *pgdat, return false; } - /* - * If we have not reclaimed enough pages for compaction and the - * inactive lists are large enough, continue reclaiming - */ - pages_for_compaction = compact_gap(sc->order); - inactive_lru_pages = node_page_state(pgdat, NR_INACTIVE_FILE); - if (get_nr_swap_pages() > 0) - inactive_lru_pages += node_page_state(pgdat, NR_INACTIVE_ANON); - if (sc->nr_reclaimed < pages_for_compaction && - inactive_lru_pages > pages_for_compaction) - return true; - /* If compaction would go ahead or the allocation would succeed, stop */ for (z = 0; z <= sc->reclaim_idx; z++) { struct zone *zone = &pgdat->node_zones[z]; @@ -2700,7 +2688,21 @@ static inline bool should_continue_reclaim(struct pglist_data *pgdat, ; } } - return true; + + /* + * If we have not reclaimed enough pages for compaction and the + * inactive lists are large enough, continue reclaiming + */ + pages_for_compaction = compact_gap(sc->order); + inactive_lru_pages = node_page_state(pgdat, NR_INACTIVE_FILE); + if (get_nr_swap_pages() > 0) + inactive_lru_pages += node_page_state(pgdat, NR_INACTIVE_ANON); + + return inactive_lru_pages > pages_for_compaction && + /* + * avoid dryrun with plenty of inactive pages + */ + nr_scanned && nr_reclaimed; } static bool pgdat_memcg_congested(pg_data_t *pgdat, struct mem_cgroup *memcg)
From: Hillf Danton <hdanton@sina.com> Address the issue of should_continue_reclaim continuing true too often for __GFP_RETRY_MAYFAIL attempts when !nr_reclaimed and nr_scanned. This could happen during hugetlb page allocation causing stalls for minutes or hours. Restructure code so that false will be returned in this case even if there are plenty of inactive pages. Need better description from Hillf Danton Need SOB from Hillf Danton --- mm/vmscan.c | 28 +++++++++++++++------------- 1 file changed, 15 insertions(+), 13 deletions(-)