[4/4] mm: Stall movable allocations until kswapd progresses during serious external fragmentation event

An event that potentially causes external fragmentation problems has
already been described but there are degrees of severity.  A "serious"
event is defined as one that steals a contiguous range of pages of an order
lower than fragment_stall_order (PAGE_ALLOC_COSTLY_ORDER by default). If a
movable allocation request that is allowed to sleep needs to steal a small
block then it schedules until kswapd makes progress or a timeout passes.
The watermarks are also boosted slightly faster so that kswapd makes
greater effort to reclaim enough pages to avoid the fragmentation event.

This stall is not guaranteed to avoid serious fragmentation events.
If memory pressure is high enough, the pages freed by kswapd may be
reallocated or the free pages may not be in pageblocks that contain
only movable pages. Furthermore an allocation request that cannot stall
(e.g. atomic allocations) or unmovable/reclaimable allocations will still
proceed without stalling.

The worst-case scenario for stalling is a combination of both high memory
pressure where kswapd is having trouble keeping free pages over the
pfmemalloc_reserve and movable allocations are fragmenting memory. In this
case, an allocation request may sleep for longer. There are both vmstats
to identify stalls are happening and a tracepoint to quantify what the
stall durations are. Note that the granularity of the stall detection is
a jiffy so the delay accounting is not precise.

1-socket Skylake machine
config-global-dhp__workload_thpfioscale XFS (no special madvise)
4 fio threads, 1 THP allocating thread
--------------------------------------

4.20-rc1 extfrag events < order 9:  1023463
4.20-rc1+patch:                      358574 (65% reduction)
4.20-rc1+patch1-3:                    19274 (98% reduction)
4.20-rc1+patch1-4:                     1094 (99.9% reduction)

                                   4.20.0-rc1             4.20.0-rc1
                                   boost-v3r1             stall-v3r1
Amean     fault-base-1      659.85 (   0.00%)      658.74 (   0.17%)
Amean     fault-huge-1      172.19 (   0.00%)      168.00 (   2.43%)

thpfioscale Percentage Faults Huge
                              4.20.0-rc1             4.20.0-rc1
                              boost-v3r1             stall-v3r1
Percentage huge-1        1.68 (   0.00%)        0.88 ( -47.52%)

Fragmentation events are now reduced to negligible levels.

The latencies and allocation success rates are roughly similar.  Over the
course of 16 minutes, there were 52 stalls due to fragmentation avoidance
with a total stall time of 0.2 seconds.

1-socket Skylake machine
global-dhp__workload_thpfioscale-madvhugepage-xfs (MADV_HUGEPAGE)
-----------------------------------------------------------------

4.20-rc1 extfrag events < order 9:  342549
4.20-rc1+patch:                     337890 ( 1% reduction)
4.20-rc1+patch1-3:                   12801 (96% reduction)
4.20-rc1+patch1-4:                    1112 (99.7% reduction)

                                   4.20.0-rc1             4.20.0-rc1
                                   boost-v3r1             stall-v3r1
Amean     fault-base-1     1578.91 (   0.00%)     1647.00 (  -4.31%)
Amean     fault-huge-1     1090.23 (   0.00%)      559.31 *  48.70%*

                              4.20.0-rc1             4.20.0-rc1
                              boost-v3r1             stall-v3r1
Percentage huge-1       82.59 (   0.00%)       99.98 (  21.05%)

The fragmentation events were reduced and the latencies are good. This
is a big difference between v2 and v3 of the series as v2 had stalls
that reached the timeout of HZ/10 where as a timeout of HZ/50 has better
latencies without compromising on fragmentation events or allocation success rates.

There were 219 stalls over the course of 16 minutes for a total stall
time of roughly 1 second (as opposed to 11 seconds with HZ/10). The
distribution of stalls is as follows

    209 4000
      1 8000
      9 20000

This shows the majority of stalls were for just one jiffie.

2-socket Haswell machine
config-global-dhp__workload_thpfioscale XFS (no special madvise)
4 fio threads, 5 THP allocating threads
----------------------------------------------------------------

4.20-rc1 extfrag events < order 9:  209820
4.20-rc1+patch:                     185923 (11% reduction)
4.20-rc1+patch1-3:                   11240 (95% reduction)
4.20-rc1+patch1-4:                    8709 (96% reduction)

                                   4.20.0-rc1             4.20.0-rc1
                                   boost-v3r1             stall-v3r1
Amean     fault-base-5     1395.28 (   0.00%)     1335.23 (   4.30%)
Amean     fault-huge-5      539.69 (   0.00%)      614.88 * -13.93%*

                              4.20.0-rc1             4.20.0-rc1
                              boost-v3r1             stall-v3r1
Percentage huge-5        0.53 (   0.00%)        2.16 ( 306.25%)

There is a slight reduction in fragmentation events but it's slight
enough that it may be due to luck.  There is a small increase in latencies
which is partially offset by a slight increase in THP allocation success
rates. There were 65 stalls over the course of 63 minutes with stall time
of a total of roughly 0.2 seconds.

2-socket Haswell machine
global-dhp__workload_thpfioscale-madvhugepage-xfs (MADV_HUGEPAGE)
-----------------------------------------------------------------

4.20-rc1 extfrag events < order 9: 167464
4.20-rc1+patch:                    130081 (22% reduction)
4.20-rc1+patch1-3:                  12057 (92% reduction)
4.20-rc1+patch1-4:                  11494 (93% reduction)

thpfioscale Fault Latencies
                                   4.20.0-rc1             4.20.0-rc1
                                   boost-v3r1             stall-v3r1
Amean     fault-base-5     8691.83 (   0.00%)     7380.80 (  15.08%)
Amean     fault-huge-5     2899.83 (   0.00%)     4066.94 * -40.25%*

                              4.20.0-rc1             4.20.0-rc1
                              boost-v3r1             stall-v3r1
Percentage huge-5       95.55 (   0.00%)       98.98 (   3.59%)

The fragmentation events are reduced and while there is some wobble on
the latency, the success rate is near 100% while under heavy pressure.
There were 2016 stalls over the course of 85 minutes with a total stall
time of roughly 8 seconds.

This patch does reduce fragmentation rates overall but it's not free
as some allocataions can stall for short periods of time and there
are knock-on effects to latency when THP allocation success rates are
higher. While it's within acceptable limits for the adverse test case,
there may be other workloads that cannot tolerate the stalls. If this
occurs, it can be tuned to disable the feature or more ideally, the test
case is made available for analysis to see if the stall behaviour can be
reduced while still limiting the fragmentation events. On the flip-side,
it has been checked that setting the fragment_stall_order to 9 eliminated
fragmentation events entirely.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 Documentation/sysctl/vm.txt   | 23 +++++++++++
 include/linux/mm.h            |  1 +
 include/linux/mmzone.h        |  2 +
 include/linux/vm_event_item.h |  1 +
 include/trace/events/kmem.h   | 21 ++++++++++
 kernel/sysctl.c               | 10 +++++
 mm/internal.h                 |  1 +
 mm/page_alloc.c               | 94 +++++++++++++++++++++++++++++++++++++------
 mm/vmstat.c                   |  1 +
 9 files changed, 142 insertions(+), 12 deletions(-)

Message ID	20181108091218.32715-5-mgorman@techsingularity.net (mailing list archive)
State	New, archived
Headers	show Return-Path: <owner-linux-mm@kvack.org> Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9876515E9 for <patchwork-linux-mm@patchwork.kernel.org>; Thu, 8 Nov 2018 09:12:33 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 86C152CF52 for <patchwork-linux-mm@patchwork.kernel.org>; Thu, 8 Nov 2018 09:12:33 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 7ACD22CF65; Thu, 8 Nov 2018 09:12:33 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 27F802CF52 for <patchwork-linux-mm@patchwork.kernel.org>; Thu, 8 Nov 2018 09:12:32 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D8ADC6B05B8; Thu, 8 Nov 2018 04:12:23 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id C44626B05B9; Thu, 8 Nov 2018 04:12:23 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A455E6B05BC; Thu, 8 Nov 2018 04:12:23 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-ed1-f70.google.com (mail-ed1-f70.google.com [209.85.208.70]) by kanga.kvack.org (Postfix) with ESMTP id 1D5006B05B8 for <linux-mm@kvack.org>; Thu, 8 Nov 2018 04:12:23 -0500 (EST) Received: by mail-ed1-f70.google.com with SMTP id f3-v6so6623930edt.11 for <linux-mm@kvack.org>; Thu, 08 Nov 2018 01:12:23 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=g+MeL4EcdhLtMd51CZ12BKAhaCbd/ks9ACTXXoyISdI=; b=FkbiSxFLCNx5IabziU00UradrFr8UbueweWvR1H/27y5gTxsfjo7VHgRVbYT3G3HGf ZlGABZcjkDu9ZO24tbFyGsaSnlFF0Q/v09S8P55Y4fHdy2AGGVMDg0IuUsNIlSoWHuXS FHBP4CnNHpOpZnz/obv0mbJuCJMtPNa1f1jzWoS3XkI34SKXuR7Uj5fsxp8Ef8EYSl1N nqOgn2bUAs5ANDw+j6A7t2xw/KIFwmLKe6jsCm4snSvoB48/SdUPXQz4JahwC4tWUdd7 ytQg+ps8ESiJKkJi7a/zP5ZDiqFucbK8A7rdYp5YNJ98MfHmAllDPgRvahI/n/oGZkXV SfiQ== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of mgorman@techsingularity.net designates 81.17.249.35 as permitted sender) smtp.mailfrom=mgorman@techsingularity.net X-Gm-Message-State: AGRZ1gLJhawD3La1qc4guhbknbCR9iHb50+Np52/vTWhFTtJWqmtAy2B UsNl8uTOWPS/3jTTDTF8tRFnAmuIO2qH0ijbZckBkHuu/JtFqGLSF0ukL+DF/86ULKlopUxHJn5 mTVmwEREKrbj/lnmT/bLei1CM8vWtcxGKBKlUlAmfBlgFH/MhfVDHbyV+X1T9o9MK2g== X-Received: by 2002:a17:906:6b99:: with SMTP id l25-v6mr2411752ejr.154.1541668342540; Thu, 08 Nov 2018 01:12:22 -0800 (PST) X-Google-Smtp-Source: AJdET5fg2Gq6Dxu2zUuTDAItrKSwy1+sTAqvwCFGnBsd0tx8X6imQBgB540Dlpy03t74S+/5t/9N X-Received: by 2002:a17:906:6b99:: with SMTP id l25-v6mr2411659ejr.154.1541668340460; Thu, 08 Nov 2018 01:12:20 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1541668340; cv=none; d=google.com; s=arc-20160816; b=I/OJBO3CX87iPtdI3dlXBoA5EEWcI3K031yVBeXQc84FKsr/8zQl5MfMpggAB5e9tI 2XHGuPJo8SJ6loMEh53qsl+l1AR1kDIkkPwhqvjNGEVy50seLSXZ2UhkYpN5ml0Yrehj U4tNnBNUGRamqp0sp94SKe/LXAIyv9VO+RQywQPdG1U9ifwd3wq3iSNyF4M5CpXqBtrR wxg32U+kI2oowuQUaRZNAd82keho5ltgs3wFTLmLiOf5NlxaTfA7tBSUCuvBa2hyKiYe Su5GK2atHbi/gK56SOTdiZKLt2tmuBQSNz7pUicmdfFSxChxQ22/Rzf4NkMXC1mFf6cm 05jQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=g+MeL4EcdhLtMd51CZ12BKAhaCbd/ks9ACTXXoyISdI=; b=e3a3XBhL8oVhZQUcTRvqhAFjqcNsKHFD/rz8BFbY2xlV9TG9MRCi6WVblYbT0IgDvi b8D65w0JTef8pP7LlmGRi5cnYOgCGNvFn+Wv9ndVI/hO/OkXHPAktRSqJxNW3GpcKaOg KP5cIzyi582cE9kkqb1w+/9r5hL1ydpC5dfS//pFKUQh3iOBKH5ChJmaT962DXV80En2 /+ul8zyLEhFSBsdeRKxEiIlyf3opeRO/+dIVT6NqzqrkkExXBYp8ODyEpd8VXWCjHkea 9WyyWgcX8bD3iF//OzTy+1qjWyuqNXUM458W1ACgMl72rJW30jRIs2cAyI2F/8KUmD1r dfuQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of mgorman@techsingularity.net designates 81.17.249.35 as permitted sender) smtp.mailfrom=mgorman@techsingularity.net Received: from outbound-smtp04.blacknight.com (outbound-smtp04.blacknight.com. [81.17.249.35]) by mx.google.com with ESMTPS id e9-v6si1863750eds.58.2018.11.08.01.12.20 for <linux-mm@kvack.org> (version=TLS1 cipher=AES128-SHA bits=128/128); Thu, 08 Nov 2018 01:12:20 -0800 (PST) Received-SPF: pass (google.com: domain of mgorman@techsingularity.net designates 81.17.249.35 as permitted sender) client-ip=81.17.249.35; Authentication-Results: mx.google.com; spf=pass (google.com: domain of mgorman@techsingularity.net designates 81.17.249.35 as permitted sender) smtp.mailfrom=mgorman@techsingularity.net Received: from mail.blacknight.com (pemlinmail01.blacknight.ie [81.17.254.10]) by outbound-smtp04.blacknight.com (Postfix) with ESMTPS id 2B56798C18 for <linux-mm@kvack.org>; Thu, 8 Nov 2018 09:12:20 +0000 (UTC) Received: (qmail 23840 invoked from network); 8 Nov 2018 09:12:20 -0000 Received: from unknown (HELO stampy.163woodhaven.lan) (mgorman@techsingularity.net@[37.228.229.69]) by 81.17.254.9 with ESMTPA; 8 Nov 2018 09:12:20 -0000 From: Mel Gorman <mgorman@techsingularity.net> To: Linux-MM <linux-mm@kvack.org> Cc: Andrew Morton <akpm@linux-foundation.org>, Vlastimil Babka <vbabka@suse.cz>, David Rientjes <rientjes@google.com>, Andrea Arcangeli <aarcange@redhat.com>, Zi Yan <zi.yan@cs.rutgers.edu>, LKML <linux-kernel@vger.kernel.org>, Mel Gorman <mgorman@techsingularity.net> Subject: [PATCH 4/4] mm: Stall movable allocations until kswapd progresses during serious external fragmentation event Date: Thu, 8 Nov 2018 09:12:18 +0000 Message-Id: <20181108091218.32715-5-mgorman@techsingularity.net> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20181108091218.32715-1-mgorman@techsingularity.net> References: <20181108091218.32715-1-mgorman@techsingularity.net> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: <linux-mm.kvack.org> X-Virus-Scanned: ClamAV using ClamSMTP
Series	Fragmentation avoidance improvements v3 \| expand [0/4] Fragmentation avoidance improvements v3 [1/4] mm, page_alloc: Spread allocations across zones before introducing fragmentation [3/4] mm: Reclaim small amounts of memory when an external fragmentation event occurs [4/4] mm: Stall movable allocations until kswapd progresses during serious external fragmentation e…

[4/4] mm: Stall movable allocations until kswapd progresses during serious external fragmentation event

Commit Message

Patch