From patchwork Tue Jun 5 19:51:40 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mel Gorman X-Patchwork-Id: 10449155 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id B01FD60234 for ; Tue, 5 Jun 2018 19:51:45 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9D3B829CA1 for ; Tue, 5 Jun 2018 19:51:45 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 9187129CAF; Tue, 5 Jun 2018 19:51:45 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00, MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id DA74829CA1 for ; Tue, 5 Jun 2018 19:51:44 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E59306B0005; Tue, 5 Jun 2018 15:51:43 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id E09426B0006; Tue, 5 Jun 2018 15:51:43 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D1F466B0007; Tue, 5 Jun 2018 15:51:43 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-wr0-f200.google.com (mail-wr0-f200.google.com [209.85.128.200]) by kanga.kvack.org (Postfix) with ESMTP id 775A96B0005 for ; Tue, 5 Jun 2018 15:51:43 -0400 (EDT) Received: by mail-wr0-f200.google.com with SMTP id x14-v6so2087309wrr.17 for ; Tue, 05 Jun 2018 12:51:43 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:date:from:to :cc:subject:message-id:references:mime-version:content-disposition :in-reply-to:user-agent; bh=+9RV+bocQpYX1UaeORnx/veSEXHxvAb6NrqmRthEG9w=; b=lfGTRyOab/enw/Kert0+sXGBmd/fYkxHmDiTYWZKM/zaMfVYzkF3LgnIIWoaESw31U /ZHcTpmBrNwDdjoq7trjkAuLpGjciHiPsnm4wWdQs3dKE6y78BTp5CoU8PdnF0kypr1H 3eUhgLjuhNsfA4e6VPssGkobrl2a9FA//CK7rsKncCfLFTCvKU5322v5jZr4VoQxD7e6 F2+sKmtncqoo/b0HKSDVgkfZXTQZtJ9DbcfolCxc5wCKhSs/1LRrqnHZWfOJzj1Zy7h4 De1fqdf61vkWWPD3GbpRzxd6TT8FW5CCxRIiBnJISB4IYKQoVTlcK20cd/Jaznay+49f sVdg== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of mgorman@techsingularity.net designates 81.17.249.193 as permitted sender) smtp.mailfrom=mgorman@techsingularity.net X-Gm-Message-State: APt69E34C3EZ6gM1v9qfu1mM5oWe+rOhZ5FrgVBjGO3eX5ccc8uNynCB mjXDcoiYLOxCVLG/EGQOituDpYWjbW4PA/9hjtvtCHNe+MLg/NNm6zvW30GnXPZnbpZlp+BvQW8 yw34rSRzWt7oDU/P38QvZrpKEgjZtfPsBTijcZi4z0YKGHf67XRyfN6t7FBGqIhXZVg== X-Received: by 2002:a50:bdc7:: with SMTP id z7-v6mr506920edh.84.1528228303034; Tue, 05 Jun 2018 12:51:43 -0700 (PDT) X-Google-Smtp-Source: ADUXVKJz2xBdktNxM6W+87N5b1lRa894olzj92l6f3jk5gevE0W0eyi3dJkVuhRSgl0meVhFAVbx X-Received: by 2002:a50:bdc7:: with SMTP id z7-v6mr506881edh.84.1528228301880; Tue, 05 Jun 2018 12:51:41 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1528228301; cv=none; d=google.com; s=arc-20160816; b=iSRx+oy70b/FzspqVuptZ0orWML38UtCUKv3XRiJXQ/DTpMcCuNP0S2VYvM7vsJ9vt Xf4yZjAYT6ZSDFwHcmgYUg3cBDjnLkMALJWAHEq3QUirNr57MmIvgV9pr7kgECZXW7ju SxyR6YlFVCXZZ76W0cAjqhxsgoXPdWcyd+VcxAeyMUDSAQpM6bDcjhOxwzE6/+Z+iVFk Nn0YzN1tu75h+uX/sNDS38+9g7EfncQECmCOR2Eq+xV9n+129zKq+LPjlrBvdnGrJ9f9 zAsrrWe3IeRqEtkYtVAi8Z3L64O4m9ZITVrrMFD5wTbB5MGXSCtvFo37pq8JtTCbB+VY vY0Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=user-agent:in-reply-to:content-disposition:mime-version:references :message-id:subject:cc:to:from:date:arc-authentication-results; bh=+9RV+bocQpYX1UaeORnx/veSEXHxvAb6NrqmRthEG9w=; b=DJxlMbpSdhD0Oc5fpxs9MdVJM9AISoqgsdGSdSxjYp6e3OrocQnnSoK0p8nWFAM9LD uAOufulOA7waYaW7iCPUHdCPEPQ8ow83qroR0+TMbB+/Cg+8uz3WM/tGahedgaQ8Go/w WnvyKxgLXLOXCTnp6yf/EW209PmV68sFqatMu3qxf62S4pqpdvuVbSy3IY2EcrbUbXW0 sf7l3dbudxE/13INr0QP7I+Ng2fgYvT5iqhGFp7a528IYEJ6DFovQj50KYF7nr8AZtqj U8GyVRnSQAzqIlbDJdH9v0MOI2AVJ64L+KS11qINBP755/S8GFWQDVJCI82DPGGHcnEC JFjQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of mgorman@techsingularity.net designates 81.17.249.193 as permitted sender) smtp.mailfrom=mgorman@techsingularity.net Received: from outbound-smtp25.blacknight.com (outbound-smtp25.blacknight.com. [81.17.249.193]) by mx.google.com with ESMTPS id d3-v6si2232130edc.96.2018.06.05.12.51.41 for (version=TLS1 cipher=AES128-SHA bits=128/128); Tue, 05 Jun 2018 12:51:41 -0700 (PDT) Received-SPF: pass (google.com: domain of mgorman@techsingularity.net designates 81.17.249.193 as permitted sender) client-ip=81.17.249.193; Authentication-Results: mx.google.com; spf=pass (google.com: domain of mgorman@techsingularity.net designates 81.17.249.193 as permitted sender) smtp.mailfrom=mgorman@techsingularity.net Received: from mail.blacknight.com (pemlinmail03.blacknight.ie [81.17.254.16]) by outbound-smtp25.blacknight.com (Postfix) with ESMTPS id 66D83B888F for ; Tue, 5 Jun 2018 20:51:41 +0100 (IST) Received: (qmail 21321 invoked from network); 5 Jun 2018 19:51:41 -0000 Received: from unknown (HELO techsingularity.net) (mgorman@techsingularity.net@[37.228.237.73]) by 81.17.254.9 with ESMTPSA (DHE-RSA-AES256-SHA encrypted, authenticated); 5 Jun 2018 19:51:41 -0000 Date: Tue, 5 Jun 2018 20:51:40 +0100 From: Mel Gorman To: Dave Hansen Cc: Andrew Morton , mhocko@kernel.org, vbabka@suse.cz, Aaron Lu , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [PATCH] mremap: Avoid TLB flushing anonymous pages that are not in swap cache Message-ID: <20180605195140.afc7xzgbre26m76l@techsingularity.net> References: <20180605171319.uc5jxdkxopio6kg3@techsingularity.net> <20180605191245.3owve7gfut22tyob@techsingularity.net> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: User-Agent: NeoMutt/20170912 (1.9.0) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP On Tue, Jun 05, 2018 at 12:49:18PM -0700, Dave Hansen wrote: > >> I usually try to make the non-pte-modifying functions take a pte_t > >> instead of 'pte_t *' to make it obvious that there no modification going > >> on. Any reason not to do that here? > > > > No, it was just a minor saving on stack usage. > > We're just splitting hairs now :) but, realistically, this little helper > will get inlined anyway, so it probably all generates the same code. > I expect so and your point is valid that it should be obvious that no modifications can take place. > ... > >> BTW, do you want to add a tiny comment about why we do the > >> trylock_page()? I assume it's because we don't want to wait on finding > >> an exact answer: we just assume it is in the swap cache if the page is > >> locked and flush regardless. > > > > It's really because calling lock_page while holding a spinlock is > > eventually going to ruin your day. > > Hah, yeah, that'll do it. Could you comment this, too? > Yep, I made the change after reading your mail. This is how it currently stands ---8<--- mremap: Avoid excessive TLB flushing for anonymous pages that are not in swap cache Commit 5d1904204c99 ("mremap: fix race between mremap() and page cleanning") fixed races between mremap and other operations for both file-backed and anonymous mappings. The file-backed was the most critical as it allowed the possibility that data could be changed on a physical page after page_mkclean returned which could trigger data loss or data integrity issues. A customer reported that the cost of the TLBs for anonymous regressions was excessive and resulting in a 30-50% drop in performance overall since this commit on a microbenchmark. Unfortunately I neither have access to the test-case nor can I describe what it does other than saying that mremap operations dominate heavily. The anonymous page race fix is overkill for two reasons. Pages that are not in the swap cache are not going to be issued for IO and if a stale TLB entry is used, the write still occurs on the same physical page. Any race with mmap replacing the address space is handled by mmap_sem. As anonymous pages are often dirty, it can mean that mremap always has to flush even when it is not necessary. This patch special cases anonymous pages to only flush ranges under the page table lock if the page is in swap cache and can be potentially queued for IO. Note that the full flush of the range being mremapped is still flushed so TLB flushes are not eliminated entirely. It uses the page lock to serialise against any potential reclaim. If the page is added to the swap cache on the reclaim side after the page lock is dropped on the mremap side then reclaim will call try_to_unmap_flush_dirty() before issuing any IO so there is no data integrity issue. This means that in the common case where a workload avoids swap entirely that mremap is a much cheaper operation due to the lack of TLB flushes. Using another testcase that simply calls mremap heavily with varying number of threads, it was found that very broadly speaking that TLB shootdowns were reduced by 31% on average throughout the entire test case but your milage will vary. Signed-off-by: Mel Gorman Acked-by: Michal Hocko --- mm/mremap.c | 45 ++++++++++++++++++++++++++++++++++++++++----- 1 file changed, 40 insertions(+), 5 deletions(-) diff --git a/mm/mremap.c b/mm/mremap.c index 049470aa1e3e..5b9767b0446e 100644 --- a/mm/mremap.c +++ b/mm/mremap.c @@ -24,6 +24,7 @@ #include #include #include +#include #include #include @@ -112,6 +113,44 @@ static pte_t move_soft_dirty_pte(pte_t pte) return pte; } +/* Returns true if a TLB must be flushed before PTL is dropped */ +static bool should_force_flush(pte_t pte) +{ + bool is_swapcache; + struct page *page; + + if (!pte_present(pte) || !pte_dirty(pte)) + return false; + + /* + * If we are remapping a dirty file PTE, make sure to flush TLB + * before we drop the PTL for the old PTE or we may race with + * page_mkclean(). + */ + page = pte_page(pte); + if (page_is_file_cache(page)) + return true; + + /* + * For anonymous pages, only flush swap cache pages that could + * be unmapped and queued for swap since flush_tlb_batched_pending was + * last called. Reclaim itself takes care that the TLB is flushed + * before IO is queued. If a page is not in swap cache and a stale TLB + * is used before mremap is complete then the write hits the same + * physical page and there is no lost data loss. + * + * Check under the page lock to avoid any potential race with reclaim. + * trylock is necessary as spinlocks are currently held. In the unlikely + * event of contention, flush the TLB to be safe. + */ + if (!trylock_page(page)) + return true; + is_swapcache = PageSwapCache(page); + unlock_page(page); + + return is_swapcache; +} + static void move_ptes(struct vm_area_struct *vma, pmd_t *old_pmd, unsigned long old_addr, unsigned long old_end, struct vm_area_struct *new_vma, pmd_t *new_pmd, @@ -163,15 +202,11 @@ static void move_ptes(struct vm_area_struct *vma, pmd_t *old_pmd, pte = ptep_get_and_clear(mm, old_addr, old_pte); /* - * If we are remapping a dirty PTE, make sure - * to flush TLB before we drop the PTL for the - * old PTE or we may race with page_mkclean(). - * * This check has to be done after we removed the * old PTE from page tables or another thread may * dirty it after the check and before the removal. */ - if (pte_present(pte) && pte_dirty(pte)) + if (should_force_flush(pte)) force_flush = true; pte = move_pte(pte, new_vma->vm_page_prot, old_addr, new_addr); pte = move_soft_dirty_pte(pte);