From patchwork Tue Jun 5 17:13:19 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mel Gorman X-Patchwork-Id: 10448807 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 9C30960375 for ; Tue, 5 Jun 2018 17:13:31 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 885522857E for ; Tue, 5 Jun 2018 17:13:31 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 7CA90285EF; Tue, 5 Jun 2018 17:13:31 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00, MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=unavailable version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8CAFB2857E for ; Tue, 5 Jun 2018 17:13:27 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 694016B0005; Tue, 5 Jun 2018 13:13:26 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 66C806B0006; Tue, 5 Jun 2018 13:13:26 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 55ABB6B0007; Tue, 5 Jun 2018 13:13:26 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-wr0-f200.google.com (mail-wr0-f200.google.com [209.85.128.200]) by kanga.kvack.org (Postfix) with ESMTP id E8E226B0005 for ; Tue, 5 Jun 2018 13:13:25 -0400 (EDT) Received: by mail-wr0-f200.google.com with SMTP id r2-v6so1831118wrm.15 for ; Tue, 05 Jun 2018 10:13:25 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:date:from:to :cc:subject:message-id:mime-version:content-disposition:user-agent; bh=vUKZVy/AcPhBF7nZrBGonORJgkQbkIwlmHD2jA5CA7E=; b=P8XKnhcmh3gWpLUt/dxKctK61LW1MIA/oVXejfBuZuj1m9G/659VfdeHyt00S+ykji eRwrz9oizDqkBhJIPn9gmiymI+aFwwU1O549TySgx22MXSnBB74Z9rLnTO+3qr2E6NEM au3jSoS4Qd083fZg8G1CZbjI3XINCX4bXgYtXPncYj6eiWXO1KJDMYqu/NULEd3Jk5zk Z99LGkrLY3X6EB5YYZA0ScPBwA0QLjr2Tj3I6qCF4R3GkcmaGuIgga8zKWVkq/Fgstbv uHDYrNfPtNEpNJTXcs9jKeWAB5IaM7BSKB41ZUt+be/jooURQ5H3b8LyFBAgDh69n1nn hWMQ== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of mgorman@techsingularity.net designates 46.22.139.13 as permitted sender) smtp.mailfrom=mgorman@techsingularity.net X-Gm-Message-State: APt69E2sLMmcoy2F2G9XlIKdjJwJr5jhn84emKJZbR0QiEktrTcfwF/H j0h5QmImWoSoaDy9/JF3sCfi0z6pspS5zWA625zcyim+BhEvmm8D/Z7vdJy23/t4O0VjxiPsGDz f4IuM9DqrEOy4c4eeQv0Vcc2uolom5qxCuYrYYaqEm09Fo+Bx9PSfN5qtYZGHuBbd3w== X-Received: by 2002:aa7:d44f:: with SMTP id q15-v6mr62515edr.170.1528218805466; Tue, 05 Jun 2018 10:13:25 -0700 (PDT) X-Google-Smtp-Source: ADUXVKIyxy9hAMCc+JuZy7ZP1kobvu5IFkA5JghqPjf/6yY4SxbKL94wJQivV25r10YCgKGj02FP X-Received: by 2002:aa7:d44f:: with SMTP id q15-v6mr62461edr.170.1528218804438; Tue, 05 Jun 2018 10:13:24 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1528218804; cv=none; d=google.com; s=arc-20160816; b=x1s/nJloOpMZ76v2KML9YcRwQ2F8Nu2T9zYQ/sumr5cAhmqNDe1G5lbaPlblytQ2wV 3B8j1HL2w34vW4Kihsoykc4Y80LctCmrkopV/OxMVv85EkG+9EnHMee2VJTkhDSG2bXC PgnBtGoYCPFQ5bPbxhGry8O8xWGosPg1IdL0WfW5mDnsgrI273cz7Kuphtmz1XAd9c8s Cho3tSOadI92uRYFqlTl813XOTIG2Dqj4GRZxYrBa8+HDY/2sSUYiHsbSvtxvw+5puNK jrCfKRYsAnS9TALIfm21AKe/7ji7L7kF/yJwBfXTXAXAYpq6KvVngDcWn05XTDLwysEm /gVw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=user-agent:content-disposition:mime-version:message-id:subject:cc :to:from:date:arc-authentication-results; bh=vUKZVy/AcPhBF7nZrBGonORJgkQbkIwlmHD2jA5CA7E=; b=WzzxWN+sy1Qb8wJ6LVPdeauA9on/m7Il8q3c3raFRQWF5WCq3rNR4c/NtYhFl7U8xg 9cjUBLExpsHMkvBdSR8b3PEvObqyI0pQccF+pPGZRIx5X2G18gRXsntdbLLzid+G76gt hdmFWSIU7eQtrYU3yPgxBFtfE5HfzQESnvQ70+93mmVEUDO4oaP9XrnNxWjFn12yidsM vgH0v6b6jMf4EmBFXEipvPvGvqLtc+ljeC9P2KYIYzPk8SBf1XsjhU4Qxm23yOzNpMTU 9YlcMoejpNhLAfNiWGSLanmEZ0A5NLFIMLBPmTn2y81UnqBk7aRSNbLZ7maowAbqCrF/ kLOQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of mgorman@techsingularity.net designates 46.22.139.13 as permitted sender) smtp.mailfrom=mgorman@techsingularity.net Received: from outbound-smtp08.blacknight.com (outbound-smtp08.blacknight.com. [46.22.139.13]) by mx.google.com with ESMTPS id a13-v6si24298869edk.364.2018.06.05.10.13.24 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 05 Jun 2018 10:13:24 -0700 (PDT) Received-SPF: pass (google.com: domain of mgorman@techsingularity.net designates 46.22.139.13 as permitted sender) client-ip=46.22.139.13; Authentication-Results: mx.google.com; spf=pass (google.com: domain of mgorman@techsingularity.net designates 46.22.139.13 as permitted sender) smtp.mailfrom=mgorman@techsingularity.net Received: from mail.blacknight.com (pemlinmail03.blacknight.ie [81.17.254.16]) by outbound-smtp08.blacknight.com (Postfix) with ESMTPS id D21891C1CCF for ; Tue, 5 Jun 2018 18:13:23 +0100 (IST) Received: (qmail 759 invoked from network); 5 Jun 2018 17:13:23 -0000 Received: from unknown (HELO techsingularity.net) (mgorman@techsingularity.net@[37.228.237.73]) by 81.17.254.9 with ESMTPSA (DHE-RSA-AES256-SHA encrypted, authenticated); 5 Jun 2018 17:13:23 -0000 Date: Tue, 5 Jun 2018 18:13:19 +0100 From: Mel Gorman To: Andrew Morton Cc: mhocko@kernel.org, vbabka@suse.cz, Aaron Lu , Dave Hansen , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH] mremap: Avoid TLB flushing anonymous pages that are not in swap cache Message-ID: <20180605171319.uc5jxdkxopio6kg3@techsingularity.net> MIME-Version: 1.0 Content-Disposition: inline User-Agent: NeoMutt/20170912 (1.9.0) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Commit 5d1904204c99 ("mremap: fix race between mremap() and page cleanning") fixed races between mremap and other operations for both file-backed and anonymous mappings. The file-backed was the most critical as it allowed the possibility that data could be changed on a physical page after page_mkclean returned which could trigger data loss or data integrity issues. A customer reported that the cost of the TLBs for anonymous regressions was excessive and resulting in a 30-50% drop in performance overall since this commit on a microbenchmark. Unfortunately I neither have access to the test-case nor can I describe what it does other than saying that mremap operations dominate heavily. The anonymous page race fix is overkill for two reasons. Pages that are not in the swap cache are not going to be issued for IO and if a stale TLB entry is used, the write still occurs on the same physical page. Any race with mmap replacing the address space is handled by mmap_sem. As anonymous pages are often dirty, it can mean that mremap always has to flush even when it is not necessary. This patch special cases anonymous pages to only flush if the page is in swap cache and can be potentially queued for IO. It uses the page lock to serialise against any potential reclaim. If the page is added to the swap cache on the reclaim side after the page lock is dropped on the mremap side then reclaim will call try_to_unmap_flush_dirty() before issuing any IO so there is no data integrity issue. This means that in the common case where a workload avoids swap entirely that mremap is a much cheaper operation due to the lack of TLB flushes. Using another testcase that simply calls mremap heavily with varying number of threads, it was found that very broadly speaking that TLB shootdowns were reduced by 31% on average throughout the entire test case but your milage will vary. Signed-off-by: Mel Gorman --- mm/mremap.c | 42 +++++++++++++++++++++++++++++++++++++----- 1 file changed, 37 insertions(+), 5 deletions(-) diff --git a/mm/mremap.c b/mm/mremap.c index 049470aa1e3e..d26c5a00fd9d 100644 --- a/mm/mremap.c +++ b/mm/mremap.c @@ -24,6 +24,7 @@ #include #include #include +#include #include #include @@ -112,6 +113,41 @@ static pte_t move_soft_dirty_pte(pte_t pte) return pte; } +/* Returns true if a TLB must be flushed before PTL is dropped */ +static bool should_force_flush(pte_t *pte) +{ + bool is_swapcache; + struct page *page; + + if (!pte_present(*pte) || !pte_dirty(*pte)) + return false; + + /* + * If we are remapping a dirty file PTE, make sure to flush TLB + * before we drop the PTL for the old PTE or we may race with + * page_mkclean(). + */ + page = pte_page(*pte); + if (page_is_file_cache(page)) + return true; + + /* + * For anonymous pages, only flush swap cache pages that could + * be unmapped and queued for swap since flush_tlb_batched_pending was + * last called. Reclaim itself takes care that the TLB is flushed + * before IO is queued. If a page is not in swap cache and a stale TLB + * is used before mremap is complete then the write hits the same + * physical page and there is no lost data loss. Check under the + * page lock to avoid any potential race with reclaim. + */ + if (!trylock_page(page)) + return true; + is_swapcache = PageSwapCache(page); + unlock_page(page); + + return is_swapcache; +} + static void move_ptes(struct vm_area_struct *vma, pmd_t *old_pmd, unsigned long old_addr, unsigned long old_end, struct vm_area_struct *new_vma, pmd_t *new_pmd, @@ -163,15 +199,11 @@ static void move_ptes(struct vm_area_struct *vma, pmd_t *old_pmd, pte = ptep_get_and_clear(mm, old_addr, old_pte); /* - * If we are remapping a dirty PTE, make sure - * to flush TLB before we drop the PTL for the - * old PTE or we may race with page_mkclean(). - * * This check has to be done after we removed the * old PTE from page tables or another thread may * dirty it after the check and before the removal. */ - if (pte_present(pte) && pte_dirty(pte)) + if (should_force_flush(&pte)) force_flush = true; pte = move_pte(pte, new_vma->vm_page_prot, old_addr, new_addr); pte = move_soft_dirty_pte(pte);