Message ID | 20190104125011.16071-17-mgorman@techsingularity.net (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Increase success rates and reduce latency of compaction v2 | expand |
On 1/4/19 1:50 PM, Mel Gorman wrote: > When scanning for sources or targets, PageCompound is checked for huge > pages as they can be skipped quickly but it happens relatively late after > a lot of setup and checking. This patch short-cuts the check to make it > earlier. It might still change when the lock is acquired but this has > less overhead overall. The free scanner advances but the migration scanner > does not. Typically the free scanner encounters more movable blocks that > change state over the lifetime of the system and also tends to scan more > aggressively as it's actively filling its portion of the physical address > space with data. This could change in the future but for the moment, > this worked better in practice and incurred fewer scan restarts. > > The impact on latency and allocation success rates is marginal but the > free scan rates are reduced by 32% and system CPU usage is reduced by > 2.6%. The 2-socket results are not materially different. Hmm, interesting that adjusting migrate scanner affected free scanner. Oh well. > Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Vlastimil Babka <vbabka@suse.cz> Nit below. > --- > mm/compaction.c | 16 ++++++++++++---- > 1 file changed, 12 insertions(+), 4 deletions(-) > > diff --git a/mm/compaction.c b/mm/compaction.c > index 608d274f9880..921720f7a416 100644 > --- a/mm/compaction.c > +++ b/mm/compaction.c > @@ -1071,6 +1071,9 @@ static bool suitable_migration_source(struct compact_control *cc, > { > int block_mt; > > + if (pageblock_skip_persistent(page)) > + return false; > + > if ((cc->mode != MIGRATE_ASYNC) || !cc->direct_compaction) > return true; > > @@ -1693,12 +1696,17 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone, > continue; > > /* > - * For async compaction, also only scan in MOVABLE blocks. > - * Async compaction is optimistic to see if the minimum amount > - * of work satisfies the allocation. > + * For async compaction, also only scan in MOVABLE blocks > + * without huge pages. Async compaction is optimistic to see > + * if the minimum amount of work satisfies the allocation. > + * The cached PFN is updated as it's possible that all > + * remaining blocks between source and target are suitable ^ unsuitable? > + * and the compaction scanners fail to meet. > */ > - if (!suitable_migration_source(cc, page)) > + if (!suitable_migration_source(cc, page)) { > + update_cached_migrate(cc, block_end_pfn); > continue; > + } > > /* Perform the isolation */ > low_pfn = isolate_migratepages_block(cc, low_pfn, >
On Thu, Jan 17, 2019 at 06:01:18PM +0100, Vlastimil Babka wrote: > On 1/4/19 1:50 PM, Mel Gorman wrote: > > When scanning for sources or targets, PageCompound is checked for huge > > pages as they can be skipped quickly but it happens relatively late after > > a lot of setup and checking. This patch short-cuts the check to make it > > earlier. It might still change when the lock is acquired but this has > > less overhead overall. The free scanner advances but the migration scanner > > does not. Typically the free scanner encounters more movable blocks that > > change state over the lifetime of the system and also tends to scan more > > aggressively as it's actively filling its portion of the physical address > > space with data. This could change in the future but for the moment, > > this worked better in practice and incurred fewer scan restarts. > > > > The impact on latency and allocation success rates is marginal but the > > free scan rates are reduced by 32% and system CPU usage is reduced by > > 2.6%. The 2-socket results are not materially different. > > Hmm, interesting that adjusting migrate scanner affected free scanner. Oh well. > Russian Roulette again. The exact scan rates depend on the system state which are non-deterministic. It's not until very late in the series that they stabilise somewhat. In fact, during the development of the series, I had to reorder patches multiple times when a corner case was dealt with to avoid 1 in every 3-6 runs having crazy insane scan rates. The final ordering was based on *relative* stability. > > Signed-off-by: Mel Gorman <mgorman@techsingularity.net> > > Acked-by: Vlastimil Babka <vbabka@suse.cz> > > Nit below. > Nit fixed.
diff --git a/mm/compaction.c b/mm/compaction.c index 608d274f9880..921720f7a416 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -1071,6 +1071,9 @@ static bool suitable_migration_source(struct compact_control *cc, { int block_mt; + if (pageblock_skip_persistent(page)) + return false; + if ((cc->mode != MIGRATE_ASYNC) || !cc->direct_compaction) return true; @@ -1693,12 +1696,17 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone, continue; /* - * For async compaction, also only scan in MOVABLE blocks. - * Async compaction is optimistic to see if the minimum amount - * of work satisfies the allocation. + * For async compaction, also only scan in MOVABLE blocks + * without huge pages. Async compaction is optimistic to see + * if the minimum amount of work satisfies the allocation. + * The cached PFN is updated as it's possible that all + * remaining blocks between source and target are suitable + * and the compaction scanners fail to meet. */ - if (!suitable_migration_source(cc, page)) + if (!suitable_migration_source(cc, page)) { + update_cached_migrate(cc, block_end_pfn); continue; + } /* Perform the isolation */ low_pfn = isolate_migratepages_block(cc, low_pfn,
When scanning for sources or targets, PageCompound is checked for huge pages as they can be skipped quickly but it happens relatively late after a lot of setup and checking. This patch short-cuts the check to make it earlier. It might still change when the lock is acquired but this has less overhead overall. The free scanner advances but the migration scanner does not. Typically the free scanner encounters more movable blocks that change state over the lifetime of the system and also tends to scan more aggressively as it's actively filling its portion of the physical address space with data. This could change in the future but for the moment, this worked better in practice and incurred fewer scan restarts. The impact on latency and allocation success rates is marginal but the free scan rates are reduced by 32% and system CPU usage is reduced by 2.6%. The 2-socket results are not materially different. Signed-off-by: Mel Gorman <mgorman@techsingularity.net> --- mm/compaction.c | 16 ++++++++++++---- 1 file changed, 12 insertions(+), 4 deletions(-)