diff mbox series

[RFC,1/3] mm, reclaim: make should_continue_reclaim perform dryrun detection

Message ID 20190724175014.9935-2-mike.kravetz@oracle.com (mailing list archive)
State New, archived
Headers show
Series fix hugetlb page allocation stalls | expand

Commit Message

Mike Kravetz July 24, 2019, 5:50 p.m. UTC
From: Hillf Danton <hdanton@sina.com>

Address the issue of should_continue_reclaim continuing true too often
for __GFP_RETRY_MAYFAIL attempts when !nr_reclaimed and nr_scanned.
This could happen during hugetlb page allocation causing stalls for
minutes or hours.

Restructure code so that false will be returned in this case even if
there are plenty of inactive pages.

Need better description from Hillf Danton
Need SOB from Hillf Danton
---
 mm/vmscan.c | 28 +++++++++++++++-------------
 1 file changed, 15 insertions(+), 13 deletions(-)

Comments

Mel Gorman July 25, 2019, 8:05 a.m. UTC | #1
On Wed, Jul 24, 2019 at 10:50:12AM -0700, Mike Kravetz wrote:
> From: Hillf Danton <hdanton@sina.com>
> 
> Address the issue of should_continue_reclaim continuing true too often
> for __GFP_RETRY_MAYFAIL attempts when !nr_reclaimed and nr_scanned.
> This could happen during hugetlb page allocation causing stalls for
> minutes or hours.
> 
> Restructure code so that false will be returned in this case even if
> there are plenty of inactive pages.
> 
> Need better description from Hillf Danton
> Need SOB from Hillf Danton

Agreed that the description could do with improvement. However, it
makes sense that if compaction reports it can make progress that it is
unnecessary to continue reclaiming. There might be side-effects with
allocation latencies in other cases but that would come at the cost of
potential premature reclaim which has consequences of itself. Overall I
think it's an improvement so I'll ack it once it has a S-O-B.
Mel Gorman July 26, 2019, 8:12 a.m. UTC | #2
> From: Hillf Danton <hdanton@sina.com>
> Subject: [RFC PATCH 1/3] mm, reclaim: make should_continue_reclaim perform dryrun detection
> 
> Address the issue of should_continue_reclaim continuing true too often
> for __GFP_RETRY_MAYFAIL attempts when !nr_reclaimed and nr_scanned.
> This could happen during hugetlb page allocation causing stalls for
> minutes or hours.
> 
> We can stop reclaiming pages if compaction reports it can make a progress.
> A code reshuffle is needed to do that. And it has side-effects, however,
> with allocation latencies in other cases but that would come at the cost
> of potential premature reclaim which has consequences of itself.
> 
> We can also bail out of reclaiming pages if we know that there are not
> enough inactive lru pages left to satisfy the costly allocation.
> 
> We can give up reclaiming pages too if we see dryrun occur, with the
> certainty of plenty of inactive pages. IOW with dryrun detected, we are
> sure we have reclaimed as many pages as we could.
> 
> Cc: Mike Kravetz <mike.kravetz@oracle.com>
> Cc: Mel Gorman <mgorman@suse.de>
> Cc: Michal Hocko <mhocko@kernel.org>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Signed-off-by: Hillf Danton <hdanton@sina.com>

Acked-by: Mel Gorman <mgorman@suse.de>

Thanks Hillf
Vlastimil Babka July 31, 2019, 11:08 a.m. UTC | #3
On 7/26/19 9:40 AM, Hillf Danton wrote:
> 
> On Thu, 25 Jul 2019 08:05:55 +0000 (UTC) Mel Gorman wrote:
>>
>> Agreed that the description could do with improvement. However, it
>> makes sense that if compaction reports it can make progress that it is
>> unnecessary to continue reclaiming.
> 
> Thanks Mike and Mel.
> 
> Hillf
> ---8<---
> From: Hillf Danton <hdanton@sina.com>
> Subject: [RFC PATCH 1/3] mm, reclaim: make should_continue_reclaim perform dryrun detection
> 
> Address the issue of should_continue_reclaim continuing true too often
> for __GFP_RETRY_MAYFAIL attempts when !nr_reclaimed and nr_scanned.
> This could happen during hugetlb page allocation causing stalls for
> minutes or hours.
> 
> We can stop reclaiming pages if compaction reports it can make a progress.
> A code reshuffle is needed to do that. And it has side-effects, however,
> with allocation latencies in other cases but that would come at the cost
> of potential premature reclaim which has consequences of itself.

I don't really understand that paragraph, did Mel meant it like this?

> We can also bail out of reclaiming pages if we know that there are not
> enough inactive lru pages left to satisfy the costly allocation.
> 
> We can give up reclaiming pages too if we see dryrun occur, with the
> certainty of plenty of inactive pages. IOW with dryrun detected, we are
> sure we have reclaimed as many pages as we could.
> 
> Cc: Mike Kravetz <mike.kravetz@oracle.com>
> Cc: Mel Gorman <mgorman@suse.de>
> Cc: Michal Hocko <mhocko@kernel.org>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Signed-off-by: Hillf Danton <hdanton@sina.com>

I agree this is an improvement overall, but perhaps the patch does too
many things at once. The reshuffle is one thing and makes sense. The
change of the last return condition could perhaps be separate. Also
AFAICS the ultimate result is that when nr_reclaimed == 0, the function
will now always return false. Which makes the initial test for
__GFP_RETRY_MAYFAIL and the comments there misleading. There will no
longer be a full LRU scan guaranteed - as long as the scanned LRU chunk
yields no reclaimed page, we abort.

> ---
>  mm/vmscan.c | 28 +++++++++++++++-------------
>  1 file changed, 15 insertions(+), 13 deletions(-)
> 
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index f4fd02a..484b6b1 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -2673,18 +2673,6 @@ static inline bool should_continue_reclaim(struct pglist_data *pgdat,
>  			return false;
>  	}
>  
> -	/*
> -	 * If we have not reclaimed enough pages for compaction and the
> -	 * inactive lists are large enough, continue reclaiming
> -	 */
> -	pages_for_compaction = compact_gap(sc->order);
> -	inactive_lru_pages = node_page_state(pgdat, NR_INACTIVE_FILE);
> -	if (get_nr_swap_pages() > 0)
> -		inactive_lru_pages += node_page_state(pgdat, NR_INACTIVE_ANON);
> -	if (sc->nr_reclaimed < pages_for_compaction &&
> -			inactive_lru_pages > pages_for_compaction)
> -		return true;
> -
>  	/* If compaction would go ahead or the allocation would succeed, stop */
>  	for (z = 0; z <= sc->reclaim_idx; z++) {
>  		struct zone *zone = &pgdat->node_zones[z];
> @@ -2700,7 +2688,21 @@ static inline bool should_continue_reclaim(struct pglist_data *pgdat,
>  			;
>  		}
>  	}
> -	return true;
> +
> +	/*
> +	 * If we have not reclaimed enough pages for compaction and the
> +	 * inactive lists are large enough, continue reclaiming
> +	 */
> +	pages_for_compaction = compact_gap(sc->order);
> +	inactive_lru_pages = node_page_state(pgdat, NR_INACTIVE_FILE);
> +	if (get_nr_swap_pages() > 0)
> +		inactive_lru_pages += node_page_state(pgdat, NR_INACTIVE_ANON);
> +
> +	return inactive_lru_pages > pages_for_compaction &&
> +		/*
> +		 * avoid dryrun with plenty of inactive pages
> +		 */
> +		nr_scanned && nr_reclaimed;
>  }
>  
>  static bool pgdat_memcg_congested(pg_data_t *pgdat, struct mem_cgroup *memcg)
> --
>
Mel Gorman July 31, 2019, 12:25 p.m. UTC | #4
On Wed, Jul 31, 2019 at 01:08:44PM +0200, Vlastimil Babka wrote:
> On 7/26/19 9:40 AM, Hillf Danton wrote:
> > 
> > On Thu, 25 Jul 2019 08:05:55 +0000 (UTC) Mel Gorman wrote:
> >>
> >> Agreed that the description could do with improvement. However, it
> >> makes sense that if compaction reports it can make progress that it is
> >> unnecessary to continue reclaiming.
> > 
> > Thanks Mike and Mel.
> > 
> > Hillf
> > ---8<---
> > From: Hillf Danton <hdanton@sina.com>
> > Subject: [RFC PATCH 1/3] mm, reclaim: make should_continue_reclaim perform dryrun detection
> > 
> > Address the issue of should_continue_reclaim continuing true too often
> > for __GFP_RETRY_MAYFAIL attempts when !nr_reclaimed and nr_scanned.
> > This could happen during hugetlb page allocation causing stalls for
> > minutes or hours.
> > 
> > We can stop reclaiming pages if compaction reports it can make a progress.
> > A code reshuffle is needed to do that. And it has side-effects, however,
> > with allocation latencies in other cases but that would come at the cost
> > of potential premature reclaim which has consequences of itself.
> 
> I don't really understand that paragraph, did Mel meant it like this?
> 

Fundamentally, the balancing act is between a) reclaiming more now so
that compaction is more likely to succeed or b) keep pages resident to
avoid refaulting.

With a) high order allocations are faster, less likely to stall and more
likely to succeed. However, it can also prematurely reclaim pages and free
more memory than is necessary for compaction to succeed in a reasonable
amount of time. We also know from testing that it can hit corner cases
with hugetlbfs where stalls happen for prolonged periods of time anyway
and the series overall is known to fix those stalls.

> > Cc: Vlastimil Babka <vbabka@suse.cz>
> > Cc: Johannes Weiner <hannes@cmpxchg.org>
> > Signed-off-by: Hillf Danton <hdanton@sina.com>
> 
> I agree this is an improvement overall, but perhaps the patch does too
> many things at once. The reshuffle is one thing and makes sense. The
> change of the last return condition could perhaps be separate. Also
> AFAICS the ultimate result is that when nr_reclaimed == 0, the function
> will now always return false. Which makes the initial test for
> __GFP_RETRY_MAYFAIL and the comments there misleading. There will no
> longer be a full LRU scan guaranteed - as long as the scanned LRU chunk
> yields no reclaimed page, we abort.
> 

I've no strong feelings on whether it is worth splitting the patch. In
my mind it's more or less doing one thing even though the one thing is a
relatively high-level problem.
Mike Kravetz July 31, 2019, 9:11 p.m. UTC | #5
On 7/31/19 4:08 AM, Vlastimil Babka wrote:
> 
> I agree this is an improvement overall, but perhaps the patch does too
> many things at once. The reshuffle is one thing and makes sense. The
> change of the last return condition could perhaps be separate. Also
> AFAICS the ultimate result is that when nr_reclaimed == 0, the function
> will now always return false. Which makes the initial test for
> __GFP_RETRY_MAYFAIL and the comments there misleading. There will no
> longer be a full LRU scan guaranteed - as long as the scanned LRU chunk
> yields no reclaimed page, we abort.

Can someone help me understand why nr_scanned == 0 guarantees a full
LRU scan?  FWICS, nr_scanned used in this context is only incremented
in shrink_page_list and potentially shrink_zones.  In the stall case I
am looking at, there are MANY cases in which nr_scanned is only a few
pages and none of those are reclaimed.

Can we not get nr_scanned == 0 on an arbitrary chunk of the LRU?

I must be missing something, because I do not see how nr_scanned == 0
guarantees a full scan.
Vlastimil Babka Aug. 1, 2019, 8:44 a.m. UTC | #6
On 7/31/19 11:11 PM, Mike Kravetz wrote:
> On 7/31/19 4:08 AM, Vlastimil Babka wrote:
>>
>> I agree this is an improvement overall, but perhaps the patch does too
>> many things at once. The reshuffle is one thing and makes sense. The
>> change of the last return condition could perhaps be separate. Also
>> AFAICS the ultimate result is that when nr_reclaimed == 0, the function
>> will now always return false. Which makes the initial test for
>> __GFP_RETRY_MAYFAIL and the comments there misleading. There will no
>> longer be a full LRU scan guaranteed - as long as the scanned LRU chunk
>> yields no reclaimed page, we abort.
> 
> Can someone help me understand why nr_scanned == 0 guarantees a full
> LRU scan?  FWICS, nr_scanned used in this context is only incremented
> in shrink_page_list and potentially shrink_zones.  In the stall case I
> am looking at, there are MANY cases in which nr_scanned is only a few
> pages and none of those are reclaimed.
> 
> Can we not get nr_scanned == 0 on an arbitrary chunk of the LRU?
> 
> I must be missing something, because I do not see how nr_scanned == 0
> guarantees a full scan.

Yeah, seems like it doesn't. More reasons to update/remove the comment.
Can be a followup cleanup if you don't want to block the series.
diff mbox series

Patch

diff --git a/mm/vmscan.c b/mm/vmscan.c
index f4fd02ae233e..484b6b1a954e 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2673,18 +2673,6 @@  static inline bool should_continue_reclaim(struct pglist_data *pgdat,
 			return false;
 	}
 
-	/*
-	 * If we have not reclaimed enough pages for compaction and the
-	 * inactive lists are large enough, continue reclaiming
-	 */
-	pages_for_compaction = compact_gap(sc->order);
-	inactive_lru_pages = node_page_state(pgdat, NR_INACTIVE_FILE);
-	if (get_nr_swap_pages() > 0)
-		inactive_lru_pages += node_page_state(pgdat, NR_INACTIVE_ANON);
-	if (sc->nr_reclaimed < pages_for_compaction &&
-			inactive_lru_pages > pages_for_compaction)
-		return true;
-
 	/* If compaction would go ahead or the allocation would succeed, stop */
 	for (z = 0; z <= sc->reclaim_idx; z++) {
 		struct zone *zone = &pgdat->node_zones[z];
@@ -2700,7 +2688,21 @@  static inline bool should_continue_reclaim(struct pglist_data *pgdat,
 			;
 		}
 	}
-	return true;
+
+	/*
+	 * If we have not reclaimed enough pages for compaction and the
+	 * inactive lists are large enough, continue reclaiming
+	 */
+	pages_for_compaction = compact_gap(sc->order);
+	inactive_lru_pages = node_page_state(pgdat, NR_INACTIVE_FILE);
+	if (get_nr_swap_pages() > 0)
+		inactive_lru_pages += node_page_state(pgdat, NR_INACTIVE_ANON);
+
+	return inactive_lru_pages > pages_for_compaction &&
+		/*
+		 * avoid dryrun with plenty of inactive pages
+		 */
+		nr_scanned && nr_reclaimed;
 }
 
 static bool pgdat_memcg_congested(pg_data_t *pgdat, struct mem_cgroup *memcg)