From patchwork Wed Nov 21 10:14:10 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mel Gorman X-Patchwork-Id: 10692277 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A27011751 for ; Wed, 21 Nov 2018 10:14:22 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 908E82B8C6 for ; Wed, 21 Nov 2018 10:14:22 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 847D42B8CA; Wed, 21 Nov 2018 10:14:22 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id ED2522B8C6 for ; Wed, 21 Nov 2018 10:14:21 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4B5376B25AE; Wed, 21 Nov 2018 05:14:18 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 4376D6B25B0; Wed, 21 Nov 2018 05:14:18 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2B31A6B25AF; Wed, 21 Nov 2018 05:14:18 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-ed1-f69.google.com (mail-ed1-f69.google.com [209.85.208.69]) by kanga.kvack.org (Postfix) with ESMTP id A95B66B25AD for ; Wed, 21 Nov 2018 05:14:17 -0500 (EST) Received: by mail-ed1-f69.google.com with SMTP id e17so2754962edr.7 for ; Wed, 21 Nov 2018 02:14:17 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id; bh=Y2FZjECEj5TfOfSIsupvejN8XNh6scPMKEwOMqnBax8=; b=ldmFTRgeFYMi5Y4yV/w7fv4nfnPoHgLSD/7MKjTxZFbfFR4nINtRalGDD3vVfYuZnX 9wvU95BfwdiOSqVfCnu69ktnV1IVMjbNFMX6pHvtJkIYNffMLnhmHw5YTiOYYZ5El78i gQVZ8786E3RusDleLDvU7tnBKn7uXZrMC3/3w4mQPgh7FJ3DA/7QkeJuPWeIWD8HxwHF R3yRVv6Nvd+AKxB+nozpUaK/+Oyhv/xeOhvc/aX2cjSaORnw+BlnAoo33FRhZ2+5z283 fhchFqFKtEtN4a7jyeze5v3yL+wNIk4V3RDP3HFNnlKJTdcb0jdUsQtCLxvTi3t9upWb xubg== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of mgorman@techsingularity.net designates 46.22.139.106 as permitted sender) smtp.mailfrom=mgorman@techsingularity.net X-Gm-Message-State: AA+aEWaLUeh6S1KgxOI0wvefMGaTG/Q6UeDIsfqdML2jw3JkyMNmIhhi FCIy6T/ES4zr9eQNj+aNUZNG+vZdqG6nA0DZIOvJ02mkYDUeR1mlRaMgscEt2N2AphpUrVYhQdH whebZyHbIsqYu4OPrWuf6r//6E82ucR1Zu7+OB93gkOtUEChXhv4HIRxoEJ+LJbLzGQ== X-Received: by 2002:a50:98c5:: with SMTP id j63mr4881779edb.273.1542795257093; Wed, 21 Nov 2018 02:14:17 -0800 (PST) X-Google-Smtp-Source: AFSGD/W1g8WMbb/KVudP0/IlPhXkA1WiczlpcDAMSE56GNIZ7rrihf9fmyXcbXdJSmrijBJ9zB3H X-Received: by 2002:a50:98c5:: with SMTP id j63mr4881717edb.273.1542795255729; Wed, 21 Nov 2018 02:14:15 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1542795255; cv=none; d=google.com; s=arc-20160816; b=yiXchrAL6iw0r1au8hoNa104OVCc8vk0j73bdHSLC137afGL288v1nG0MdQmprsxzz yNgRIAPj3haWrxvR6Hz2MBfo+dS+r1uelwM1TMQviOcCu38RGa/iD/bQkE4A0tMvjnPm 7tloWlU9Mk+xtLGQcD0VCOLQTyFEfICgRTpe7fMUj0QfWCI97e0zuRd+83rj27gQPOoA NmqyTvwr+knWIWldCOlXB/bWFpNsrEGLSE5bPyPKxwkwfscIbSI08l48HYvPqhtmIyj5 vUg10dGjWJHa90QE4Zvu+n6SNn4YbbpVyEVdZrYLeZubgge7ZZ8DtxaEee6ADR4TZot7 LU1A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=message-id:date:subject:cc:to:from; bh=Y2FZjECEj5TfOfSIsupvejN8XNh6scPMKEwOMqnBax8=; b=ar9b0h0Ij7etOu996JKCRoFlyls6FS6gSEsjwFquWIyzZHVT3i8Y5ATaVBi2ooCXDq LqV3vxwKwcXYeTXkAicLRS6CnUiYHH5CsnUAzsorZFJbxyBWND76aatw/76lNPtRdRp/ Pcrh6ajEcUqpGr4+1Wbmmkq2Vt6o42R5YBfem/kHHt9q4G6fELvtlRyAuhf+DqBYl/8W tT/ZJONGJvLmRQ5Z+aC0l/YhO353ltg5qlik2/idtstfgAEXhaJJxk/nof6T/6fhAGbS Ay1LVIqZ5KBHeNBWgPDdA2KhqEnlKXbYYSUr6p179rWJ3oc5vd8m89oJ0urR6Ry1IWhp SSAQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of mgorman@techsingularity.net designates 46.22.139.106 as permitted sender) smtp.mailfrom=mgorman@techsingularity.net Received: from outbound-smtp11.blacknight.com (outbound-smtp11.blacknight.com. [46.22.139.106]) by mx.google.com with ESMTPS id t10si685289edq.195.2018.11.21.02.14.15 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 21 Nov 2018 02:14:15 -0800 (PST) Received-SPF: pass (google.com: domain of mgorman@techsingularity.net designates 46.22.139.106 as permitted sender) client-ip=46.22.139.106; Authentication-Results: mx.google.com; spf=pass (google.com: domain of mgorman@techsingularity.net designates 46.22.139.106 as permitted sender) smtp.mailfrom=mgorman@techsingularity.net Received: from mail.blacknight.com (pemlinmail01.blacknight.ie [81.17.254.10]) by outbound-smtp11.blacknight.com (Postfix) with ESMTPS id EE4AA1C25B7 for ; Wed, 21 Nov 2018 10:14:14 +0000 (GMT) Received: (qmail 19847 invoked from network); 21 Nov 2018 10:14:14 -0000 Received: from unknown (HELO stampy.163woodhaven.lan) (mgorman@techsingularity.net@[37.228.229.69]) by 81.17.254.9 with ESMTPA; 21 Nov 2018 10:14:14 -0000 From: Mel Gorman To: Linux-MM Cc: Andrew Morton , Vlastimil Babka , David Rientjes , Andrea Arcangeli , Zi Yan , Michal Hocko , LKML , Mel Gorman Subject: [PATCH 0/4] Fragmentation avoidance improvements v4 Date: Wed, 21 Nov 2018 10:14:10 +0000 Message-Id: <20181121101414.21301-1-mgorman@techsingularity.net> X-Mailer: git-send-email 2.16.4 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP No major change from v3 really, mostly resending to see if there is any review reaction. It's rebased but a partial test indicated that the behaviour is similar to the previous baseline Changelog since v3 o Rebase to 4.20-rc3 o Remove a stupid warning from the last patch Changelog since v2 o Drop patch 5 as it was borderline o Decrease timeout when stalling on fragmentation events Changelog since v1 o Rebase to v4.20-rc1 for the THP __GFP_THISNODE patch in particular o Add tracepoint to record fragmentation stall durations o Add vmstat event to record that a fragmentation stall occurred o Stalls now alter watermark boosting o Stalls occur only when the allocation is about to fail It has been noted before that fragmentation avoidance (aka anti-fragmentation) is not perfect. Given sufficient time or an adverse workload, memory gets fragmented and the long-term success of high-order allocations degrades. This series defines an adverse workload, a definition of external fragmentation events (including serious) ones and a series that reduces the level of those fragmentation events. The details of the workload and the consequences are described in more detail in the changelogs. However, from patch 1, this is a high-level summary of the adverse workload. The exact details are found in the mmtests implementation. The broad details of the workload are as follows; 1. Create an XFS filesystem (not specified in the configuration but done as part of the testing for this patch) 2. Start 4 fio threads that write a number of 64K files inefficiently. Inefficiently means that files are created on first access and not created in advance (fio parameterr create_on_open=1) and fallocate is not used (fallocate=none). With multiple IO issuers this creates a mix of slab and page cache allocations over time. The total size of the files is 150% physical memory so that the slabs and page cache pages get mixed 3. Warm up a number of fio read-only threads accessing the same files created in step 2. This part runs for the same length of time it took to create the files. It'll fault back in old data and further interleave slab and page cache allocations. As it's now low on memory due to step 2, fragmentation occurs as pageblocks get stolen. 4. While step 3 is still running, start a process that tries to allocate 75% of memory as huge pages with a number of threads. The number of threads is based on a (NR_CPUS_SOCKET - NR_FIO_THREADS)/4 to avoid THP threads contending with fio, any other threads or forcing cross-NUMA scheduling. Note that the test has not been used on a machine with less than 8 cores. The benchmark records whether huge pages were allocated and what the fault latency was in microseconds 5. Measure the number of events potentially causing external fragmentation, the fault latency and the huge page allocation success rate. 6. Cleanup Overall the series reduces external fragmentation causing events by over 95% on 1 and 2 socket machines, which in turn impacts high-order allocation success rates over the long term. There are differences in latencies and high-order allocation success rates. Latencies are a mixed bag as they are vulnerable to exact system state and whether allocations succeeded so they are treated as a secondary metric. Patch 1 uses lower zones if they are populated and have free memory instead of fragmenting a higher zone. It's special cased to handle a Normal->DMA32 fallback with the reasons explained in the changelog. Patch 2+3 boosts watermarks temporarily when an external fragmentation event occurs. kswapd wakes to reclaim a small amount of old memory and then wakes kcompactd on completion to recover the system slightly. This introduces some overhead in the slowpath. The level of boosting can be tuned or disabled depending on the tolerance for fragmentation vs allocation latency. Patch 4 is more heavy handed. In the event of a movable allocation request that can stall, it'll wake kswapd as in patch 3. However, if the expected fragmentation event is serious then the request will stall briefly on pfmemalloc_wait until kswapd completes light reclaim work and retry the allocation without stalling. This can avoid the fragmentation event entirely in some cases. The definition of a serious fragmentation event can be tuned or disabled. The bulk of the improvement in fragmentation avoidance is from patches 1-3 (94-97% reduction in fragmentation events for an adverse workload on both a 1-socket and 2-socket machine). The primary benefit of patch 4 is the increase in THP success rates and the fact it reduces fragmentation events to almost negligible levels with the option of eliminating them. Documentation/sysctl/vm.txt | 42 ++++++++ include/linux/mm.h | 2 + include/linux/mmzone.h | 14 ++- include/linux/vm_event_item.h | 1 + include/trace/events/kmem.h | 21 ++++ kernel/sysctl.c | 18 ++++ mm/compaction.c | 2 +- mm/internal.h | 14 ++- mm/page_alloc.c | 238 ++++++++++++++++++++++++++++++++++++++---- mm/vmscan.c | 123 ++++++++++++++++++++-- mm/vmstat.c | 1 + 11 files changed, 436 insertions(+), 40 deletions(-)