diff mbox series

mm/compaction: use proper zoneid for compaction_suitable()

Message ID 1564062621-8105-1-git-send-email-laoar.shao@gmail.com (mailing list archive)
State New, archived
Headers show
Series mm/compaction: use proper zoneid for compaction_suitable() | expand

Commit Message

Yafang Shao July 25, 2019, 1:50 p.m. UTC
By now there're three compaction paths,
- direct compaction
- kcompactd compcation
- proc triggered compaction
When we do compaction in all these paths, we will use compaction_suitable()
to check whether a zone is suitable to do compaction.

There're some issues around the usage of compaction_suitable().
We don't use the proper zoneid in kcompactd_node_suitable() when try to
wakeup kcompactd. In the kcompactd compaction paths, we call
compaction_suitable() twice and the zoneid isn't proper in the second call.
For proc triggered compaction, the classzone_idx is always zero.

In order to fix these issues, I change the type of classzone_idx in the
struct compact_control from const int to int and assign the proper zoneid
before calling compact_zone().

This patch also fixes some comments in struct compact_control, as these
fields are not only for direct compactor but also for all other compactors.

Fixes: ebff398017c6("mm, compaction: pass classzone_idx and alloc_flags to watermark checking")
Fixes: 698b1b30642f("mm, compaction: introduce kcompactd")
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Yafang Shao <shaoyafang@didiglobal.com>
---
 mm/compaction.c | 12 +++++-------
 mm/internal.h   | 10 +++++-----
 2 files changed, 10 insertions(+), 12 deletions(-)

Comments

Mel Gorman July 26, 2019, 7:09 a.m. UTC | #1
On Thu, Jul 25, 2019 at 09:50:21AM -0400, Yafang Shao wrote:
> By now there're three compaction paths,
> - direct compaction
> - kcompactd compcation
> - proc triggered compaction
> When we do compaction in all these paths, we will use compaction_suitable()
> to check whether a zone is suitable to do compaction.
> 
> There're some issues around the usage of compaction_suitable().
> We don't use the proper zoneid in kcompactd_node_suitable() when try to
> wakeup kcompactd. In the kcompactd compaction paths, we call
> compaction_suitable() twice and the zoneid isn't proper in the second call.
> For proc triggered compaction, the classzone_idx is always zero.
> 
> In order to fix these issues, I change the type of classzone_idx in the
> struct compact_control from const int to int and assign the proper zoneid
> before calling compact_zone().
> 

What is actually fixed by this?

> This patch also fixes some comments in struct compact_control, as these
> fields are not only for direct compactor but also for all other compactors.
> 
> Fixes: ebff398017c6("mm, compaction: pass classzone_idx and alloc_flags to watermark checking")
> Fixes: 698b1b30642f("mm, compaction: introduce kcompactd")
> Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: Arnd Bergmann <arnd@arndb.de>
> Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: Yafang Shao <shaoyafang@didiglobal.com>
> ---
>  mm/compaction.c | 12 +++++-------
>  mm/internal.h   | 10 +++++-----
>  2 files changed, 10 insertions(+), 12 deletions(-)
> 
> diff --git a/mm/compaction.c b/mm/compaction.c
> index ac4ead0..984dea7 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -2425,6 +2425,7 @@ static void compact_node(int nid)
>  			continue;
>  
>  		cc.zone = zone;
> +		cc.classzone_idx = zoneid;
>  
>  		compact_zone(&cc, NULL);
>  
> @@ -2508,7 +2509,7 @@ static bool kcompactd_node_suitable(pg_data_t *pgdat)
>  			continue;
>  
>  		if (compaction_suitable(zone, pgdat->kcompactd_max_order, 0,
> -					classzone_idx) == COMPACT_CONTINUE)
> +					zoneid) == COMPACT_CONTINUE)
>  			return true;
>  	}
>  

This is a semantic change. The use of the classzone_idx here and not
classzone_idx is so that the watermark check takes the lowmem reserves
into account in the __zone_watermark_ok check. This means that
compaction is more likely to proceed but not necessarily correct.

> @@ -2526,7 +2527,6 @@ static void kcompactd_do_work(pg_data_t *pgdat)
>  	struct compact_control cc = {
>  		.order = pgdat->kcompactd_max_order,
>  		.search_order = pgdat->kcompactd_max_order,
> -		.classzone_idx = pgdat->kcompactd_classzone_idx,
>  		.mode = MIGRATE_SYNC_LIGHT,
>  		.ignore_skip_hint = false,
>  		.gfp_mask = GFP_KERNEL,
> @@ -2535,7 +2535,7 @@ static void kcompactd_do_work(pg_data_t *pgdat)
>  							cc.classzone_idx);
>  	count_compact_event(KCOMPACTD_WAKE);
>  
> -	for (zoneid = 0; zoneid <= cc.classzone_idx; zoneid++) {
> +	for (zoneid = 0; zoneid <= pgdat->kcompactd_classzone_idx; zoneid++) {
>  		int status;
>  
>  		zone = &pgdat->node_zones[zoneid];

This variable can be updated by a wakeup while the loop is executing
making the loop more difficult to reason about given the exit conditions
can change.

Please explain what exactly this patch is fixing and why it should be
done because it currently appears to be making a number of subtle
changes without justification.
Yafang Shao July 26, 2019, 10:06 a.m. UTC | #2
On Fri, Jul 26, 2019 at 3:09 PM Mel Gorman <mgorman@techsingularity.net> wrote:
>
> On Thu, Jul 25, 2019 at 09:50:21AM -0400, Yafang Shao wrote:
> > By now there're three compaction paths,
> > - direct compaction
> > - kcompactd compcation
> > - proc triggered compaction
> > When we do compaction in all these paths, we will use compaction_suitable()
> > to check whether a zone is suitable to do compaction.
> >
> > There're some issues around the usage of compaction_suitable().
> > We don't use the proper zoneid in kcompactd_node_suitable() when try to
> > wakeup kcompactd. In the kcompactd compaction paths, we call
> > compaction_suitable() twice and the zoneid isn't proper in the second call.
> > For proc triggered compaction, the classzone_idx is always zero.
> >
> > In order to fix these issues, I change the type of classzone_idx in the
> > struct compact_control from const int to int and assign the proper zoneid
> > before calling compact_zone().
> >
>
> What is actually fixed by this?
>

Recently there's a page alloc failure on our server because the
compaction can't satisfy it.
This issue is unproducible, so I have to view the compaction code and
find out the possible solutions.
When I'm reading these compaction code, I find some  misuse of
compaction_suitable().
But after you point out, I find that I missed something.
The classzone_idx should represent the alloc request, otherwise we may
do unnecessary compaction on a zone.
Thanks a lot for your explaination.

Hi Andrew,

Pls. help drop this patch. Sorry about that.
I will think about it more.

> > This patch also fixes some comments in struct compact_control, as these
> > fields are not only for direct compactor but also for all other compactors.
> >
> > Fixes: ebff398017c6("mm, compaction: pass classzone_idx and alloc_flags to watermark checking")
> > Fixes: 698b1b30642f("mm, compaction: introduce kcompactd")
> > Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> > Cc: Vlastimil Babka <vbabka@suse.cz>
> > Cc: Arnd Bergmann <arnd@arndb.de>
> > Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
> > Cc: Rik van Riel <riel@redhat.com>
> > Cc: Yafang Shao <shaoyafang@didiglobal.com>
> > ---
> >  mm/compaction.c | 12 +++++-------
> >  mm/internal.h   | 10 +++++-----
> >  2 files changed, 10 insertions(+), 12 deletions(-)
> >
> > diff --git a/mm/compaction.c b/mm/compaction.c
> > index ac4ead0..984dea7 100644
> > --- a/mm/compaction.c
> > +++ b/mm/compaction.c
> > @@ -2425,6 +2425,7 @@ static void compact_node(int nid)
> >                       continue;
> >
> >               cc.zone = zone;
> > +             cc.classzone_idx = zoneid;
> >
> >               compact_zone(&cc, NULL);
> >
> > @@ -2508,7 +2509,7 @@ static bool kcompactd_node_suitable(pg_data_t *pgdat)
> >                       continue;
> >
> >               if (compaction_suitable(zone, pgdat->kcompactd_max_order, 0,
> > -                                     classzone_idx) == COMPACT_CONTINUE)
> > +                                     zoneid) == COMPACT_CONTINUE)
> >                       return true;
> >       }
> >
>
> This is a semantic change. The use of the classzone_idx here and not
> classzone_idx is so that the watermark check takes the lowmem reserves
> into account in the __zone_watermark_ok check. This means that
> compaction is more likely to proceed but not necessarily correct.
>
> > @@ -2526,7 +2527,6 @@ static void kcompactd_do_work(pg_data_t *pgdat)
> >       struct compact_control cc = {
> >               .order = pgdat->kcompactd_max_order,
> >               .search_order = pgdat->kcompactd_max_order,
> > -             .classzone_idx = pgdat->kcompactd_classzone_idx,
> >               .mode = MIGRATE_SYNC_LIGHT,
> >               .ignore_skip_hint = false,
> >               .gfp_mask = GFP_KERNEL,
> > @@ -2535,7 +2535,7 @@ static void kcompactd_do_work(pg_data_t *pgdat)
> >                                                       cc.classzone_idx);
> >       count_compact_event(KCOMPACTD_WAKE);
> >
> > -     for (zoneid = 0; zoneid <= cc.classzone_idx; zoneid++) {
> > +     for (zoneid = 0; zoneid <= pgdat->kcompactd_classzone_idx; zoneid++) {
> >               int status;
> >
> >               zone = &pgdat->node_zones[zoneid];
>
> This variable can be updated by a wakeup while the loop is executing
> making the loop more difficult to reason about given the exit conditions
> can change.
>

Thanks for your point out.

But seems there're still issues event without my change ?
For example,
If we call wakeup_kcompactd() while the kcompactd is running,
we just modify the kcompactd_max_order and kcompactd_classzone_idx and
then return.
Then in another path, the wakeup_kcompactd() is called again,
so kcompactd_classzone_idx and kcompactd_max_order will be override,
that means the previous wakeup is missed.
Right ?


> Please explain what exactly this patch is fixing and why it should be
> done because it currently appears to be making a number of subtle
> changes without justification.
>
> --
> Mel Gorman
> SUSE Labs
Mel Gorman July 26, 2019, 10:26 a.m. UTC | #3
On Fri, Jul 26, 2019 at 06:06:48PM +0800, Yafang Shao wrote:
> On Fri, Jul 26, 2019 at 3:09 PM Mel Gorman <mgorman@techsingularity.net> wrote:
> >
> > On Thu, Jul 25, 2019 at 09:50:21AM -0400, Yafang Shao wrote:
> > > By now there're three compaction paths,
> > > - direct compaction
> > > - kcompactd compcation
> > > - proc triggered compaction
> > > When we do compaction in all these paths, we will use compaction_suitable()
> > > to check whether a zone is suitable to do compaction.
> > >
> > > There're some issues around the usage of compaction_suitable().
> > > We don't use the proper zoneid in kcompactd_node_suitable() when try to
> > > wakeup kcompactd. In the kcompactd compaction paths, we call
> > > compaction_suitable() twice and the zoneid isn't proper in the second call.
> > > For proc triggered compaction, the classzone_idx is always zero.
> > >
> > > In order to fix these issues, I change the type of classzone_idx in the
> > > struct compact_control from const int to int and assign the proper zoneid
> > > before calling compact_zone().
> > >
> >
> > What is actually fixed by this?
> >
> 
> Recently there's a page alloc failure on our server because the
> compaction can't satisfy it.

That could be for a wide variety of reasons. There are limits to how
aggressive compaction will be but if there are unmovable pages preventing
the allocation, no amount of cleverness with the wakeups will change that.

> This issue is unproducible, so I have to view the compaction code and
> find out the possible solutions.

For high allocation success rates, the focus should be on strictness of
fragmentation control (hard, multiple tradeoffs) or increasing the number
of pages that can be moved (very hard, multiple tradeoffs).

> When I'm reading these compaction code, I find some  misuse of
> compaction_suitable().
> But after you point out, I find that I missed something.
> The classzone_idx should represent the alloc request, otherwise we may
> do unnecessary compaction on a zone.
> Thanks a lot for your explaination.
> 

Exactly.

> Hi Andrew,
> 
> Pls. help drop this patch. Sorry about that.

Agreed but there is no need to apologise. The full picture of this problem
is not obvious, not described anywhere and it's extremely difficult to
test and verify.

> > > <SNIP>
> > > @@ -2535,7 +2535,7 @@ static void kcompactd_do_work(pg_data_t *pgdat)
> > >                                                       cc.classzone_idx);
> > >       count_compact_event(KCOMPACTD_WAKE);
> > >
> > > -     for (zoneid = 0; zoneid <= cc.classzone_idx; zoneid++) {
> > > +     for (zoneid = 0; zoneid <= pgdat->kcompactd_classzone_idx; zoneid++) {
> > >               int status;
> > >
> > >               zone = &pgdat->node_zones[zoneid];
> >
> > This variable can be updated by a wakeup while the loop is executing
> > making the loop more difficult to reason about given the exit conditions
> > can change.
> >
> 
> Thanks for your point out.
> 
> But seems there're still issues event without my change ?
> For example,
> If we call wakeup_kcompactd() while the kcompactd is running,
> we just modify the kcompactd_max_order and kcompactd_classzone_idx and
> then return.
> Then in another path, the wakeup_kcompactd() is called again,
> so kcompactd_classzone_idx and kcompactd_max_order will be override,
> that means the previous wakeup is missed.
> Right ?
> 

That's intended. When kcompactd wakes up, it takes a snapshot of what is
requested and works on that. Other requests can update the requirements for
a future compaction request if necessary. One could argue that the wakeup
is missed but really it's "defer that request to some kcompactd activity
in the future". If kcompactd loops until there are no more requests, it
can consume an excessive amount of CPU due to requests continually keeping
it awake. kcompactd is best-effort to reduce the amount of direct stalls
due to compaction but an allocation request always faces the possibility
that it may stall because a kernel thread has not made enough progress
or failed.

FWIW, similar problems hit kswapd in the past where allocation requests
could artifically keep it awake consuming 100% of CPU.
Yafang Shao July 26, 2019, 11:13 a.m. UTC | #4
On Fri, Jul 26, 2019 at 6:26 PM Mel Gorman <mgorman@techsingularity.net> wrote:
>
> On Fri, Jul 26, 2019 at 06:06:48PM +0800, Yafang Shao wrote:
> > On Fri, Jul 26, 2019 at 3:09 PM Mel Gorman <mgorman@techsingularity.net> wrote:
> > >
> > > On Thu, Jul 25, 2019 at 09:50:21AM -0400, Yafang Shao wrote:
> > > > By now there're three compaction paths,
> > > > - direct compaction
> > > > - kcompactd compcation
> > > > - proc triggered compaction
> > > > When we do compaction in all these paths, we will use compaction_suitable()
> > > > to check whether a zone is suitable to do compaction.
> > > >
> > > > There're some issues around the usage of compaction_suitable().
> > > > We don't use the proper zoneid in kcompactd_node_suitable() when try to
> > > > wakeup kcompactd. In the kcompactd compaction paths, we call
> > > > compaction_suitable() twice and the zoneid isn't proper in the second call.
> > > > For proc triggered compaction, the classzone_idx is always zero.
> > > >
> > > > In order to fix these issues, I change the type of classzone_idx in the
> > > > struct compact_control from const int to int and assign the proper zoneid
> > > > before calling compact_zone().
> > > >
> > >
> > > What is actually fixed by this?
> > >
> >
> > Recently there's a page alloc failure on our server because the
> > compaction can't satisfy it.
>
> That could be for a wide variety of reasons. There are limits to how
> aggressive compaction will be but if there are unmovable pages preventing
> the allocation, no amount of cleverness with the wakeups will change that.
>

Yes, we should know whether it is lack of movable pages or the
compaction can't catch up first.
I think it would be better if there're some debugging facilities could
help us do that.

> > This issue is unproducible, so I have to view the compaction code and
> > find out the possible solutions.
>
> For high allocation success rates, the focus should be on strictness of
> fragmentation control (hard, multiple tradeoffs) or increasing the number
> of pages that can be moved (very hard, multiple tradeoffs).
>

Agreed, that's a tradeoff.

> > When I'm reading these compaction code, I find some  misuse of
> > compaction_suitable().
> > But after you point out, I find that I missed something.
> > The classzone_idx should represent the alloc request, otherwise we may
> > do unnecessary compaction on a zone.
> > Thanks a lot for your explaination.
> >
>
> Exactly.
>
> > Hi Andrew,
> >
> > Pls. help drop this patch. Sorry about that.
>
> Agreed but there is no need to apologise. The full picture of this problem
> is not obvious, not described anywhere and it's extremely difficult to
> test and verify.
>
> > > > <SNIP>
> > > > @@ -2535,7 +2535,7 @@ static void kcompactd_do_work(pg_data_t *pgdat)
> > > >                                                       cc.classzone_idx);
> > > >       count_compact_event(KCOMPACTD_WAKE);
> > > >
> > > > -     for (zoneid = 0; zoneid <= cc.classzone_idx; zoneid++) {
> > > > +     for (zoneid = 0; zoneid <= pgdat->kcompactd_classzone_idx; zoneid++) {
> > > >               int status;
> > > >
> > > >               zone = &pgdat->node_zones[zoneid];
> > >
> > > This variable can be updated by a wakeup while the loop is executing
> > > making the loop more difficult to reason about given the exit conditions
> > > can change.
> > >
> >
> > Thanks for your point out.
> >
> > But seems there're still issues event without my change ?
> > For example,
> > If we call wakeup_kcompactd() while the kcompactd is running,
> > we just modify the kcompactd_max_order and kcompactd_classzone_idx and
> > then return.
> > Then in another path, the wakeup_kcompactd() is called again,
> > so kcompactd_classzone_idx and kcompactd_max_order will be override,
> > that means the previous wakeup is missed.
> > Right ?
> >
>
> That's intended. When kcompactd wakes up, it takes a snapshot of what is
> requested and works on that. Other requests can update the requirements for
> a future compaction request if necessary. One could argue that the wakeup
> is missed but really it's "defer that request to some kcompactd activity
> in the future". If kcompactd loops until there are no more requests, it
> can consume an excessive amount of CPU due to requests continually keeping
> it awake. kcompactd is best-effort to reduce the amount of direct stalls
> due to compaction but an allocation request always faces the possibility
> that it may stall because a kernel thread has not made enough progress
> or failed.
>
> FWIW, similar problems hit kswapd in the past where allocation requests
> could artifically keep it awake consuming 100% of CPU.
>

Understood. Thanks for your explanation again.

Thanks
Yafang
diff mbox series

Patch

diff --git a/mm/compaction.c b/mm/compaction.c
index ac4ead0..984dea7 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -2425,6 +2425,7 @@  static void compact_node(int nid)
 			continue;
 
 		cc.zone = zone;
+		cc.classzone_idx = zoneid;
 
 		compact_zone(&cc, NULL);
 
@@ -2508,7 +2509,7 @@  static bool kcompactd_node_suitable(pg_data_t *pgdat)
 			continue;
 
 		if (compaction_suitable(zone, pgdat->kcompactd_max_order, 0,
-					classzone_idx) == COMPACT_CONTINUE)
+					zoneid) == COMPACT_CONTINUE)
 			return true;
 	}
 
@@ -2526,7 +2527,6 @@  static void kcompactd_do_work(pg_data_t *pgdat)
 	struct compact_control cc = {
 		.order = pgdat->kcompactd_max_order,
 		.search_order = pgdat->kcompactd_max_order,
-		.classzone_idx = pgdat->kcompactd_classzone_idx,
 		.mode = MIGRATE_SYNC_LIGHT,
 		.ignore_skip_hint = false,
 		.gfp_mask = GFP_KERNEL,
@@ -2535,7 +2535,7 @@  static void kcompactd_do_work(pg_data_t *pgdat)
 							cc.classzone_idx);
 	count_compact_event(KCOMPACTD_WAKE);
 
-	for (zoneid = 0; zoneid <= cc.classzone_idx; zoneid++) {
+	for (zoneid = 0; zoneid <= pgdat->kcompactd_classzone_idx; zoneid++) {
 		int status;
 
 		zone = &pgdat->node_zones[zoneid];
@@ -2545,14 +2545,12 @@  static void kcompactd_do_work(pg_data_t *pgdat)
 		if (compaction_deferred(zone, cc.order))
 			continue;
 
-		if (compaction_suitable(zone, cc.order, 0, zoneid) !=
-							COMPACT_CONTINUE)
-			continue;
-
 		if (kthread_should_stop())
 			return;
 
 		cc.zone = zone;
+		cc.classzone_idx = zoneid;
+
 		status = compact_zone(&cc, NULL);
 
 		if (status == COMPACT_SUCCESS) {
diff --git a/mm/internal.h b/mm/internal.h
index 0d5f720..c224a16 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -190,11 +190,11 @@  struct compact_control {
 	unsigned long total_free_scanned;
 	unsigned short fast_search_fail;/* failures to use free list searches */
 	short search_order;		/* order to start a fast search at */
-	const gfp_t gfp_mask;		/* gfp mask of a direct compactor */
-	int order;			/* order a direct compactor needs */
-	int migratetype;		/* migratetype of direct compactor */
-	const unsigned int alloc_flags;	/* alloc flags of a direct compactor */
-	const int classzone_idx;	/* zone index of a direct compactor */
+	const gfp_t gfp_mask;		/* gfp mask of a compactor */
+	int order;			/* order a compactor needs */
+	int migratetype;		/* migratetype of a compactor */
+	const unsigned int alloc_flags;	/* alloc flags of a compactor */
+	int classzone_idx;		/* zone index of a compactor */
 	enum migrate_mode mode;		/* Async or sync migration mode */
 	bool ignore_skip_hint;		/* Scan blocks even if marked skip */
 	bool no_set_skip_hint;		/* Don't mark blocks for skipping */