From patchwork Mon Aug 20 03:22:03 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrea Arcangeli X-Patchwork-Id: 10569853 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2FF06920 for ; Mon, 20 Aug 2018 03:22:18 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 1B072290C8 for ; Mon, 20 Aug 2018 03:22:18 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 0F328291AC; Mon, 20 Aug 2018 03:22:18 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id ADE5D290C8 for ; Mon, 20 Aug 2018 03:22:17 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7E5EA6B16E9; Sun, 19 Aug 2018 23:22:10 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 7949D6B16EC; Sun, 19 Aug 2018 23:22:10 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6A9E66B16ED; Sun, 19 Aug 2018 23:22:10 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-qk0-f200.google.com (mail-qk0-f200.google.com [209.85.220.200]) by kanga.kvack.org (Postfix) with ESMTP id 2F84D6B16E9 for ; Sun, 19 Aug 2018 23:22:10 -0400 (EDT) Received: by mail-qk0-f200.google.com with SMTP id u22-v6so13773170qkk.10 for ; Sun, 19 Aug 2018 20:22:10 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=YM+1RGBP5v6oaaQ5bsW9GVsTVOSu0jeIOEDLQWln6Ks=; b=UWWeThrynRhzx1ZOhu1Ate1n9DDsfFibNND4Id3z5BSWrB786DuWEqCKK9jzALyIO5 hClOpfU3hNhm3rpnYSeXBsISd4VuoNFVuNG8ZZkCRgfSyXn2CgNfDCs7/HwO31Kh8mc/ 0ZQY5B4SlkajalyDFf27X/Q4qDtGfsmh9Gy94GOwIiOknLSxvH0AEh7gj/uyE5sxZgdf ygWY/bDYhrHwJ3cQkEamQg0ZA7cKTseeZKRlB/KLF9PhA4WI+uo5aH9OZF8ajSWVqsJ4 zdu4auO/EIogNt42Hy4/C006XFAQzG66IOuaFpIzSfieTzjk/Q7rzIVL+hrBDXpHbDq8 nICA== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of aarcange@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=aarcange@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com X-Gm-Message-State: AOUpUlGpfPTDdsipbV+asmftGWwD1SteDkXXm9EtZWzkQ0yx4fbTXGhB Ss2Qw6qq9KnJOWZc6pHNSCR0TJ6dAU8ZFYKsvB5gb03a/qcHvMnbPH1gaDWI0jySNVEOVnbdaQC C7FDWyCceyN9dKOobCBmuO1fU/7d9FvAygNyuEPf/yem6kYfDNdt5nNyYRxNhC0FlkA== X-Received: by 2002:a37:bbc1:: with SMTP id l184-v6mr38794951qkf.111.1534735330001; Sun, 19 Aug 2018 20:22:10 -0700 (PDT) X-Google-Smtp-Source: AA+uWPy7ebDqNjnEDSdG57+cXRcUuOL5szRcuTQk0QMC02ib1Dpz2vJP35D8BP4l8euA19oP03hl X-Received: by 2002:a37:bbc1:: with SMTP id l184-v6mr38794926qkf.111.1534735329323; Sun, 19 Aug 2018 20:22:09 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1534735329; cv=none; d=google.com; s=arc-20160816; b=ZLP/LbaLj76pFa1mQ+eEvTR8AfYxfepdBuDOMVohLbLmASmcSaaYA+XMRRrrSGhjTY RT+hzCBCxJUhUq0AzTzb9qUwcCxmZfyRnnvTLbk2mT22pv4Bsq7RMkzmEijSZyBbxPBx JbRI91WUVuLVcYXf5JLzwVlN2ZBBB4YVPm+JjcV280GMR2UChfxUoR7x7jRY9rlNPhED eoJlVU88OuFSqjgjFObGfs1FH/15pL8qPUhJkMlfSZTIJTmgieVKsafeW4VRLo/j2LNn Y0eKgyouJOayMoL/Npysr1bfI0v4sgdDu/pZya9SisRdxxoZt8suEqHw/cyo32u95z2Y v40Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from :arc-authentication-results; bh=YM+1RGBP5v6oaaQ5bsW9GVsTVOSu0jeIOEDLQWln6Ks=; b=o0x+PvKxvWxd5YpdsZnaG9GuqEg5ovDnaOA0brnPcwV87gBK03cK7voIDkSrIdxy7J 2DjUyXnVqfhrHOGl3j8f8M2x6fEHKtg3pEZrVceeG3BoX/BeVh7mr1tYYQIDZR37EWex RkZ0/vK/q1esk7mC9MxpOHzAwkxmmoJ/9v9S1lpSTNGQmtc2xKv9+/PpiCJNxo3CDsJl dRpcVZ1H5hOwVLHDZIkC5dkxI9JSj59JZn4bN/t4CqySLu+lA8PSn1Aw2dqy7hBkwQEL mshuumBwfd8LvYCXDkwryCLUN59YP6NraehMM3MhYMKfN2Lfda3G7n6Syt2JQAH0F/P+ VdgA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of aarcange@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=aarcange@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from mx1.redhat.com (mx1.redhat.com. [209.132.183.28]) by mx.google.com with ESMTPS id b3-v6si3056046qva.64.2018.08.19.20.22.09 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sun, 19 Aug 2018 20:22:09 -0700 (PDT) Received-SPF: pass (google.com: domain of aarcange@redhat.com designates 209.132.183.28 as permitted sender) client-ip=209.132.183.28; Authentication-Results: mx.google.com; spf=pass (google.com: domain of aarcange@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=aarcange@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 8398713A9A; Mon, 20 Aug 2018 03:22:08 +0000 (UTC) Received: from sky.random (ovpn-120-72.rdu2.redhat.com [10.10.120.72]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 2BBAE721F5; Mon, 20 Aug 2018 03:22:05 +0000 (UTC) From: Andrea Arcangeli To: Andrew Morton Cc: linux-mm@kvack.org, Alex Williamson , David Rientjes , Vlastimil Babka Subject: [PATCH 1/2] mm: thp: consolidate policy_nodemask call Date: Sun, 19 Aug 2018 23:22:03 -0400 Message-Id: <20180820032204.9591-2-aarcange@redhat.com> In-Reply-To: <20180820032204.9591-1-aarcange@redhat.com> References: <20180820032204.9591-1-aarcange@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.29]); Mon, 20 Aug 2018 03:22:08 +0000 (UTC) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Just a minor cleanup. Signed-off-by: Andrea Arcangeli --- mm/mempolicy.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/mm/mempolicy.c b/mm/mempolicy.c index 01f1a14facc4..d6512ef28cde 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -2026,6 +2026,8 @@ alloc_pages_vma(gfp_t gfp, int order, struct vm_area_struct *vma, goto out; } + nmask = policy_nodemask(gfp, pol); + if (unlikely(IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) && hugepage)) { int hpage_node = node; @@ -2043,7 +2045,6 @@ alloc_pages_vma(gfp_t gfp, int order, struct vm_area_struct *vma, !(pol->flags & MPOL_F_LOCAL)) hpage_node = pol->v.preferred_node; - nmask = policy_nodemask(gfp, pol); if (!nmask || node_isset(hpage_node, *nmask)) { mpol_cond_put(pol); page = __alloc_pages_node(hpage_node, @@ -2052,7 +2053,6 @@ alloc_pages_vma(gfp_t gfp, int order, struct vm_area_struct *vma, } } - nmask = policy_nodemask(gfp, pol); preferred_nid = policy_node(gfp, pol, node); page = __alloc_pages_nodemask(gfp, order, preferred_nid, nmask); mpol_cond_put(pol); From patchwork Mon Aug 20 03:22:04 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrea Arcangeli X-Patchwork-Id: 10569851 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7AB20920 for ; Mon, 20 Aug 2018 03:22:15 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 67A6828D28 for ; Mon, 20 Aug 2018 03:22:15 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 5BF1B290FA; Mon, 20 Aug 2018 03:22:15 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A29C328D28 for ; Mon, 20 Aug 2018 03:22:14 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 441236B16EB; Sun, 19 Aug 2018 23:22:10 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 3E99C6B16ED; Sun, 19 Aug 2018 23:22:10 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2D9806B16EC; Sun, 19 Aug 2018 23:22:10 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-qk0-f198.google.com (mail-qk0-f198.google.com [209.85.220.198]) by kanga.kvack.org (Postfix) with ESMTP id ECAEE6B16E9 for ; Sun, 19 Aug 2018 23:22:09 -0400 (EDT) Received: by mail-qk0-f198.google.com with SMTP id x204-v6so13765785qka.6 for ; Sun, 19 Aug 2018 20:22:09 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=0ZA5VXdpbzOe/VQqReIM4t/EwmkSMpN9mRsng2yvQo8=; b=XjRGLQMZTDAyNazluWaF/X/ReM2AboJYbm7XW+hMT1hJfxptttsRQVlY1ifZ+kvR1a Lk1GYEEcsQPECFVwiThSHJ183siIlcjhKDP7bTR9nZCnBmVMf2pb36Ma2r7wYf+irxlo Vz9mZTRdo8cxbbHBeet20gPg/2YunvHeuFxw/3ZkwujxH1qEZP8fLIZTn2MZisK75sA7 wM3UxWVxrsNyUfhblHm+Bzfn6zbbMB2NtAiYlKbC1TK5csmu7Nocj45ztMPm73Fu1dOb f+lZJN3NSb/vAWCXJSggiiYDmaThAMRuorPxhr7gwPvSZpJuWEEF37Ofy7aEdqQiKilU Fkvw== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of aarcange@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=aarcange@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com X-Gm-Message-State: AOUpUlH81GvmWrfo8YWDtDohrHOJlJuYPJs3BOV5duvFpWtWqlApH1q5 8O/g/ins26sXHtpoytU6Zv19axvmx67FgimFqKRWRD5QlNvvNmWtriUoaIcAlOF46MS1Av6CHcr ofpxKSjBM9xaLOHAGFJEudh0Y/kbMWggSIYsMVDtSPsdTitoEabpjAkyNkTdRWBbcaw== X-Received: by 2002:ac8:44d2:: with SMTP id b18-v6mr41004203qto.167.1534735329689; Sun, 19 Aug 2018 20:22:09 -0700 (PDT) X-Google-Smtp-Source: AA+uWPyQvJmpbUchN//gVpol+ulfEq0ayAiRDCb8TFPn6iZCtJx1Uu59wdLTUKOBr6rdfr1tGBH2 X-Received: by 2002:ac8:44d2:: with SMTP id b18-v6mr41004175qto.167.1534735328829; Sun, 19 Aug 2018 20:22:08 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1534735328; cv=none; d=google.com; s=arc-20160816; b=ew/l4SJ1nghKKUisYYT1oR8OnegjDBM+E7GA31FeIEi+mmYYKMjeLdAvjzexmC16QE kPripaw1IQsTzN/S2+GkSb+KNt1n9+pCnA8UMERQtn5+lV4s7QewmR1ijbhbL6qDjLFJ N9kugQ42hlGPKfEx+HHDentEJ4Wb5Gdrx86YFUOKG3SgkRe/J+JNywwDlsibq+B1nmPQ pmxORGkkxj6bLeDG1JIqbA4SDDnWeTtZSYQeRo9xeo8Wlh7VYM9SsSjo57c4VGx0XDfX FH+piuo2sT2ah+RXyw0DCWbtdKsW6wdfXA1h0UTunJHxC72HdcEeRmbN05m/v2fcDhmc adaA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from :arc-authentication-results; bh=0ZA5VXdpbzOe/VQqReIM4t/EwmkSMpN9mRsng2yvQo8=; b=m/kaCPoygT1VYzQteFX/1644l/lBQf/0bOqE7Je2B29G1TgUYoHUTqBHjso3YRXEV8 syJ++tpiktTvXu5dh8rTylKf2mY7gtnPJUnjmte5r02oIR3DT7bpbspg/3lzjfMgbWEs zVPJXz/2Wot0Bf1yrXJ9Tp2YCLwdEb1q0+Fsr5fEdGVKq0Dv1Nq8OXMFikf5YsB/v0ir xwLgb4Tnx6c01V3BXifhQiTKdUcCqQk7lzzubSrOql4vThPGS3tHNDSGqdmM3iU7CiuT 29Z4v9cpHpG6s0TReR8nuqw1GLw4DPVi8qjsVSDPyLZ/1I5TQRKRqOLuXQiO/1clcgFx o2pw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of aarcange@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=aarcange@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from mx1.redhat.com (mx1.redhat.com. [209.132.183.28]) by mx.google.com with ESMTPS id s1-v6si2744212qta.330.2018.08.19.20.22.08 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sun, 19 Aug 2018 20:22:08 -0700 (PDT) Received-SPF: pass (google.com: domain of aarcange@redhat.com designates 209.132.183.28 as permitted sender) client-ip=209.132.183.28; Authentication-Results: mx.google.com; spf=pass (google.com: domain of aarcange@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=aarcange@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from smtp.corp.redhat.com (int-mx10.intmail.prod.int.phx2.redhat.com [10.5.11.25]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 8D6AC4E4CB; Mon, 20 Aug 2018 03:22:07 +0000 (UTC) Received: from sky.random (ovpn-120-72.rdu2.redhat.com [10.10.120.72]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 2BF5C2010D00; Mon, 20 Aug 2018 03:22:05 +0000 (UTC) From: Andrea Arcangeli To: Andrew Morton Cc: linux-mm@kvack.org, Alex Williamson , David Rientjes , Vlastimil Babka Subject: [PATCH 2/2] mm: thp: fix transparent_hugepage/defrag = madvise || always Date: Sun, 19 Aug 2018 23:22:04 -0400 Message-Id: <20180820032204.9591-3-aarcange@redhat.com> In-Reply-To: <20180820032204.9591-1-aarcange@redhat.com> References: <20180820032204.9591-1-aarcange@redhat.com> X-Scanned-By: MIMEDefang 2.84 on 10.5.11.25 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.38]); Mon, 20 Aug 2018 03:22:07 +0000 (UTC) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP qemu uses MADV_HUGEPAGE which allows direct compaction (i.e. __GFP_DIRECT_RECLAIM is set). The problem is that direct compaction combined with the NUMA __GFP_THISNODE logic in mempolicy.c is telling reclaim to swap very hard the local node, instead of failing the allocation if there's no THP available in the local node. Such logic was ok until __GFP_THISNODE was added to the THP allocation path even with MPOL_DEFAULT. The idea behind the __GFP_THISNODE addition, is that it is better to provide local memory in PAGE_SIZE units than to use remote NUMA THP backed memory. That largely depends on the remote latency though, on threadrippers for example the overhead is relatively low in my experience. The combination of __GFP_THISNODE and __GFP_DIRECT_RECLAIM results in extremely slow qemu startup with vfio, if the VM is larger than the size of one host NUMA node. This is because it will try very hard to unsuccessfully swapout get_user_pages pinned pages as result of the __GFP_THISNODE being set, instead of falling back to PAGE_SIZE allocations and instead of trying to allocate THP on other nodes (it would be even worse without vfio type1 GUP pins of course, except it'd be swapping heavily instead). It's very easy to reproduce this by setting transparent_hugepage/defrag to "always", even with a simple memhog. 1) This can be fixed by retaining the __GFP_THISNODE logic also for __GFP_DIRECT_RELCAIM by allowing only one compaction run. Not even COMPACT_SKIPPED (i.e. compaction failing because not enough free memory in the zone) should be allowed to invoke reclaim. 2) An alternative is not use __GFP_THISNODE if __GFP_DIRECT_RELCAIM has been set by the caller (i.e. MADV_HUGEPAGE or defrag="always"). That would keep the NUMA locality restriction only when __GFP_DIRECT_RECLAIM is not set by the caller. So THP will be provided from remote nodes if available before falling back to PAGE_SIZE units in the local node, but an app using defrag = always (or madvise with MADV_HUGEPAGE) supposedly prefers that. These are the results of 1) (higher GB/s is better). Finished: 30 GB mapped, 10.188535s elapsed, 2.94GB/s Finished: 34 GB mapped, 12.274777s elapsed, 2.77GB/s Finished: 38 GB mapped, 13.847840s elapsed, 2.74GB/s Finished: 42 GB mapped, 14.288587s elapsed, 2.94GB/s Finished: 30 GB mapped, 8.907367s elapsed, 3.37GB/s Finished: 34 GB mapped, 10.724797s elapsed, 3.17GB/s Finished: 38 GB mapped, 14.272882s elapsed, 2.66GB/s Finished: 42 GB mapped, 13.929525s elapsed, 3.02GB/s These are the results of 2) (higher GB/s is better). Finished: 30 GB mapped, 10.163159s elapsed, 2.95GB/s Finished: 34 GB mapped, 11.806526s elapsed, 2.88GB/s Finished: 38 GB mapped, 10.369081s elapsed, 3.66GB/s Finished: 42 GB mapped, 12.357719s elapsed, 3.40GB/s Finished: 30 GB mapped, 8.251396s elapsed, 3.64GB/s Finished: 34 GB mapped, 12.093030s elapsed, 2.81GB/s Finished: 38 GB mapped, 11.824903s elapsed, 3.21GB/s Finished: 42 GB mapped, 15.950661s elapsed, 2.63GB/s This is current upstream (higher GB/s is better). Finished: 30 GB mapped, 8.821632s elapsed, 3.40GB/s Finished: 34 GB mapped, 341.979543s elapsed, 0.10GB/s Finished: 38 GB mapped, 761.933231s elapsed, 0.05GB/s Finished: 42 GB mapped, 1188.409235s elapsed, 0.04GB/s vfio is a good test because by pinning all memory it avoids the swapping and reclaim only wastes CPU, a memhog based test would created swapout storms and supposedly show a bigger stddev. What is better between 1) and 2) depends on the hardware and on the software. Virtualization EPT/NTP gets a bigger boost from THP as well than host applications. This commit implements 1). Reported-by: Alex Williamson Signed-off-by: Andrea Arcangeli Reported-by: Stefan Priebe Signed-off-by: Michal Hocko --- include/linux/gfp.h | 18 ++++++++++++++++++ mm/mempolicy.c | 12 +++++++++++- mm/page_alloc.c | 4 ++++ 3 files changed, 33 insertions(+), 1 deletion(-) diff --git a/include/linux/gfp.h b/include/linux/gfp.h index a6afcec53795..3c04d5d90e6d 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -44,6 +44,7 @@ struct vm_area_struct; #else #define ___GFP_NOLOCKDEP 0 #endif +#define ___GFP_ONLY_COMPACT 0x1000000u /* If the above are modified, __GFP_BITS_SHIFT may need updating */ /* @@ -178,6 +179,21 @@ struct vm_area_struct; * definitely preferable to use the flag rather than opencode endless * loop around allocator. * Using this flag for costly allocations is _highly_ discouraged. + * + * __GFP_ONLY_COMPACT: Only invoke compaction. Do not try to succeed + * the allocation by freeing memory. Never risk to free any + * "PAGE_SIZE" memory unit even if compaction failed specifically + * because of not enough free pages in the zone. This only makes sense + * only in combination with __GFP_THISNODE (enforced with a + * VM_WARN_ON), to restrict the THP allocation in the local node that + * triggered the page fault and fallback into PAGE_SIZE allocations in + * the same node. We don't want to invoke reclaim because there may be + * plenty of free memory already in the local node. More importantly + * there may be even plenty of free THP available in remote nodes so + * we should allocate those if something instead of reclaiming any + * memory in the local node. Implementation detail: set ___GFP_NORETRY + * too so that ___GFP_ONLY_COMPACT only needs to be checked in a slow + * path. */ #define __GFP_IO ((__force gfp_t)___GFP_IO) #define __GFP_FS ((__force gfp_t)___GFP_FS) @@ -187,6 +203,8 @@ struct vm_area_struct; #define __GFP_RETRY_MAYFAIL ((__force gfp_t)___GFP_RETRY_MAYFAIL) #define __GFP_NOFAIL ((__force gfp_t)___GFP_NOFAIL) #define __GFP_NORETRY ((__force gfp_t)___GFP_NORETRY) +#define __GFP_ONLY_COMPACT ((__force gfp_t)(___GFP_NORETRY | \ + ___GFP_ONLY_COMPACT)) /* * Action modifiers diff --git a/mm/mempolicy.c b/mm/mempolicy.c index d6512ef28cde..6bf839f20dcc 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -2047,8 +2047,18 @@ alloc_pages_vma(gfp_t gfp, int order, struct vm_area_struct *vma, if (!nmask || node_isset(hpage_node, *nmask)) { mpol_cond_put(pol); + /* + * We restricted the allocation to the + * hpage_node so we must use + * __GFP_ONLY_COMPACT to allow at most a + * compaction attempt and not ever get into + * reclaim or it'll swap heavily with + * transparent_hugepage/defrag = always (or + * madvise under MADV_HUGEPAGE). + */ page = __alloc_pages_node(hpage_node, - gfp | __GFP_THISNODE, order); + gfp | __GFP_THISNODE | + __GFP_ONLY_COMPACT, order); goto out; } } diff --git a/mm/page_alloc.c b/mm/page_alloc.c index a790ef4be74e..01a5c2bd0860 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -4144,6 +4144,10 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, */ if (compact_result == COMPACT_DEFERRED) goto nopage; + if (gfp_mask & __GFP_ONLY_COMPACT) { + VM_WARN_ON(!(gfp_mask & __GFP_THISNODE)); + goto nopage; + } /* * Looks like reclaim/compaction is worth trying, but