mbox series

[0/3] Recalculate per-cpu page allocator batch and high limits after deferred meminit

Message ID 20191018105606.3249-1-mgorman@techsingularity.net (mailing list archive)
Headers show
Series Recalculate per-cpu page allocator batch and high limits after deferred meminit | expand

Message

Mel Gorman Oct. 18, 2019, 10:56 a.m. UTC
A private report stated that system CPU usage was excessive on an AMD
EPYC 2 machine while building kernels with much longer build times than
expected. The issue is partially explained by high zone lock contention
due to the per-cpu page allocator batch and high limits being calculated
incorrectly. This series addresses a large chunk of the problem. Patch 1
is mostly cosmetic but prepares for patch 2 which is the real fix. Patch
3 is definiely cosmetic but was noticed while implementing the fix. Proper
details are in the changelog for patch 2.

 include/linux/mm.h |  3 ---
 mm/internal.h      |  3 +++
 mm/page_alloc.c    | 33 ++++++++++++++++++++-------------
 3 files changed, 23 insertions(+), 16 deletions(-)

Comments

Matt Fleming Oct. 18, 2019, 11:58 a.m. UTC | #1
On Fri, 18 Oct, at 11:56:03AM, Mel Gorman wrote:
> A private report stated that system CPU usage was excessive on an AMD
> EPYC 2 machine while building kernels with much longer build times than
> expected. The issue is partially explained by high zone lock contention
> due to the per-cpu page allocator batch and high limits being calculated
> incorrectly. This series addresses a large chunk of the problem. Patch 1
> is mostly cosmetic but prepares for patch 2 which is the real fix. Patch
> 3 is definiely cosmetic but was noticed while implementing the fix. Proper
> details are in the changelog for patch 2.
> 
>  include/linux/mm.h |  3 ---
>  mm/internal.h      |  3 +++
>  mm/page_alloc.c    | 33 ++++++++++++++++++++-------------
>  3 files changed, 23 insertions(+), 16 deletions(-)

Just to confirm, these patches don't fix the issue we're seeing on the
EPYC 2 machines, but they do return the batch sizes to sensible values.
Mel Gorman Oct. 18, 2019, 12:54 p.m. UTC | #2
On Fri, Oct 18, 2019 at 12:58:49PM +0100, Matt Fleming wrote:
> On Fri, 18 Oct, at 11:56:03AM, Mel Gorman wrote:
> > A private report stated that system CPU usage was excessive on an AMD
> > EPYC 2 machine while building kernels with much longer build times than
> > expected. The issue is partially explained by high zone lock contention
> > due to the per-cpu page allocator batch and high limits being calculated
> > incorrectly. This series addresses a large chunk of the problem. Patch 1
> > is mostly cosmetic but prepares for patch 2 which is the real fix. Patch
> > 3 is definiely cosmetic but was noticed while implementing the fix. Proper
> > details are in the changelog for patch 2.
> > 
> >  include/linux/mm.h |  3 ---
> >  mm/internal.h      |  3 +++
> >  mm/page_alloc.c    | 33 ++++++++++++++++++++-------------
> >  3 files changed, 23 insertions(+), 16 deletions(-)
> 
> Just to confirm, these patches don't fix the issue we're seeing on the
> EPYC 2 machines, but they do return the batch sizes to sensible values.

To be clear, does the patch a) fix *some* of the issue and there is
something else also going on that needs to be chased down or b) has no
impact on build time or system CPU usage on your machine?
Matt Fleming Oct. 18, 2019, 2:48 p.m. UTC | #3
On Fri, 18 Oct, at 01:54:49PM, Mel Gorman wrote:
> On Fri, Oct 18, 2019 at 12:58:49PM +0100, Matt Fleming wrote:
> > On Fri, 18 Oct, at 11:56:03AM, Mel Gorman wrote:
> > > A private report stated that system CPU usage was excessive on an AMD
> > > EPYC 2 machine while building kernels with much longer build times than
> > > expected. The issue is partially explained by high zone lock contention
> > > due to the per-cpu page allocator batch and high limits being calculated
> > > incorrectly. This series addresses a large chunk of the problem. Patch 1
> > > is mostly cosmetic but prepares for patch 2 which is the real fix. Patch
> > > 3 is definiely cosmetic but was noticed while implementing the fix. Proper
> > > details are in the changelog for patch 2.
> > > 
> > >  include/linux/mm.h |  3 ---
> > >  mm/internal.h      |  3 +++
> > >  mm/page_alloc.c    | 33 ++++++++++++++++++++-------------
> > >  3 files changed, 23 insertions(+), 16 deletions(-)
> > 
> > Just to confirm, these patches don't fix the issue we're seeing on the
> > EPYC 2 machines, but they do return the batch sizes to sensible values.
> 
> To be clear, does the patch a) fix *some* of the issue and there is
> something else also going on that needs to be chased down or b) has no
> impact on build time or system CPU usage on your machine?

Sorry, I realise my email was pretty unclear.

These patches *do* fix some of the issue because I no longer see as
much contention on the zone locks with the patches applied.