From patchwork Tue Aug 28 11:20:33 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nicholas Piggin X-Patchwork-Id: 10578281 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 911DF174C for ; Tue, 28 Aug 2018 11:20:57 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 822D428D71 for ; Tue, 28 Aug 2018 11:20:57 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 758A7299CC; Tue, 28 Aug 2018 11:20:57 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E8B2728D71 for ; Tue, 28 Aug 2018 11:20:56 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 78C036B45EC; Tue, 28 Aug 2018 07:20:55 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 738846B45EE; Tue, 28 Aug 2018 07:20:55 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 62A056B45EF; Tue, 28 Aug 2018 07:20:55 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pf1-f200.google.com (mail-pf1-f200.google.com [209.85.210.200]) by kanga.kvack.org (Postfix) with ESMTP id 1BD756B45EC for ; Tue, 28 Aug 2018 07:20:55 -0400 (EDT) Received: by mail-pf1-f200.google.com with SMTP id i68-v6so762787pfb.9 for ; Tue, 28 Aug 2018 04:20:55 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:from:to:cc:subject:date :message-id:in-reply-to:references; bh=rCMUeVqKxlpR8mAw1QaqwgN5G/ui6F18sxW16mRgzbg=; b=aK1S50usuoG5DhnC4VjB4EG5YEP6i1WRx4LVAgZNxlJhDpSJh5dYvUDBAzi+r2B8T/ V5BYXAmokKwph5E840IguIxRURJdp5OrzGXvmVufgClHi7wRu1DSB22vTrku2DuQFP0e XaleSYo0q10IXmU7JOwL3gjZkbLWUln7flhifFFWpvRJpDmZzEi/htLJQCdkjmfNgGKA d2a6fhWMm7yZtuccuvGSrRuoblajCqH7fwDSouFNDZXtX3JpknE6fmxEA5Hl79wlwqnJ euHYo/MxBdBGXzt/JLVGiU5EERWjwDkbDxJawE8bD5b8CjR5B6Lxl8vbkITeBsrXi7DW 17cg== X-Gm-Message-State: APzg51By0+KWRtzzh7VBAF6B5bxz4opIrAdWuVhR6YAufs/Wy3Njc1SA maw43wMwR3QVIYm2LRN9G9a+cjEKL+tC/VgKTB2DiJMv3oLML1qTfYrS6/YIPUcNdpTH+7MOyXz PXDUyg4wiSJ/x0qjWYVtt/Xo9yEhFxO+kUGuYyHNxxhHItZEBdCXhAVtQVo8KEnWwGNTyhiHkoL LNSYhdLkZds4RDWUHwN/HN0DoEZ1Sa800ZrbjrzFkJXPsxDeRB2S3kc3Efs9tjgnRe5GSwr1BP/ D690q2Ak5xAPLyAZLqPjP2ma2+XoM5UJUJmY8ak9cTkljiANhUO+af7FX+WjTvOCGbAGaSz4rkC bF8Tiy60iCOdn/zmR9fkwATGUXbrNN8Aa1Rok+VX5RcfnEC0spgve0bc0pbGW+eE10Tv5rr5G4j O X-Received: by 2002:a17:902:558f:: with SMTP id g15-v6mr1142985pli.38.1535455254771; Tue, 28 Aug 2018 04:20:54 -0700 (PDT) X-Received: by 2002:a17:902:558f:: with SMTP id g15-v6mr1142929pli.38.1535455253780; Tue, 28 Aug 2018 04:20:53 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1535455253; cv=none; d=google.com; s=arc-20160816; b=IT9etu6j5bAR7FNDmfvelKYs87C/GGf3rATyqF9wurh1daAobDl8K5NrVVAVHro2lX H5NJT1Dw7Lo0DmPsXxpWnHn+osVysBVm4Yq6XRse1K5dcHMrWgFz7M3aHPn4pkOy5eCJ cM9gJJntIcJYS2Z1bZhRZgVFEHbKujfITAltPFpmBup6mUSazkjj4UX9sjYze7CLgUDj CFHNVAVsKWtdYl4XyxKFBEkg+YcCS4BVlG1yo6ynnxInvDy2nR4klfmt0tF+CXcXydx7 ex+tl+nH+7/sj+Hty8cAbz6r01bMQt8KmoQSqy9en40t4FZ+RwsWNm80nxpRx3vhxxp5 cVEA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature:arc-authentication-results; bh=rCMUeVqKxlpR8mAw1QaqwgN5G/ui6F18sxW16mRgzbg=; b=Ypr2n/axfU4BA15e8+ulBKw5plmbVFJ4sQnthT9Mf5VI/1c1X9k92xjrzyDm6oGGtN jwaetCptprYXdkqEfwjLazhunUR/XLZmmh3ZMZOtFAAZeV4Phafkl1PYysdL5xyWn16e 9jGcc4BC0jxa9Rvd4+MfpDZnVYofKi3HW0qTUxKgLxN2SCmpA+EH8FWXFIc5hlNkKaw4 bq+G42a9EVBffj3Z7fRhGaF1uTq5G0w0sfGBvPfz706pc1K4gbjTo3ZtsJQk3OHO7zr/ 6ReUpNil9bETvxYlB1Vqj/0MDbx2/Vc7JE+asaFGE7f2MZ6kFbSPqtkov0jdg/8XBNLj +y/g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=rOguRM3J; spf=pass (google.com: domain of npiggin@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=npiggin@gmail.com; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from mail-sor-f65.google.com (mail-sor-f65.google.com. [209.85.220.65]) by mx.google.com with SMTPS id b69-v6sor240052plb.139.2018.08.28.04.20.53 for (Google Transport Security); Tue, 28 Aug 2018 04:20:53 -0700 (PDT) Received-SPF: pass (google.com: domain of npiggin@gmail.com designates 209.85.220.65 as permitted sender) client-ip=209.85.220.65; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=rOguRM3J; spf=pass (google.com: domain of npiggin@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=npiggin@gmail.com; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=rCMUeVqKxlpR8mAw1QaqwgN5G/ui6F18sxW16mRgzbg=; b=rOguRM3J3lcM/kPZaSPcAEQbaVyrefutmBcfZrlssC+KfOFnsXJ6Vjui5heOyza9Xt iK7+x8ng89Bjapc7F0EsHJVJL5mB4ONxLruzuwpj/fUVU1WeBA6YVerZ2bDzK8nAemv0 uiBq3kX8C2IYkp9o+e1KW8Ac4R6HS4UvGcSAbaickQf+2TuqDmHDAX2cpKOajfYDYmOA VAm6IQKvxoIOHHydgXN+4A1KtQN/jt1SfMGDmY9E5mZE9DQgP2DXejhn/yB6yWhb8JwG 0TUf+aWp/CUuaqFWPjc6qjO1WRuCZukxDNIRvlCHNhD7UCpEn6x9ZTVKVRx+zG6Q4Ttp u1YA== X-Google-Smtp-Source: ANB0Vdbt9K3kz71sMSBn8c+JLrglp1376ALqJAIKdR2WDl+whCCjXliR4Y+1Pyz04+N2UEXNEBrAKQ== X-Received: by 2002:a17:902:6b4c:: with SMTP id g12-v6mr1111608plt.159.1535455253350; Tue, 28 Aug 2018 04:20:53 -0700 (PDT) Received: from roar.au.ibm.com (59-102-81-67.tpgi.com.au. [59.102.81.67]) by smtp.gmail.com with ESMTPSA id s3-v6sm3287917pgj.84.2018.08.28.04.20.49 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 28 Aug 2018 04:20:52 -0700 (PDT) From: Nicholas Piggin To: linux-mm@kvack.org Cc: Nicholas Piggin , linux-arch@vger.kernel.org, linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, Andrew Morton , Linus Torvalds Subject: [PATCH 2/3] mm/cow: optimise pte dirty/accessed bits handling in fork Date: Tue, 28 Aug 2018 21:20:33 +1000 Message-Id: <20180828112034.30875-3-npiggin@gmail.com> X-Mailer: git-send-email 2.18.0 In-Reply-To: <20180828112034.30875-1-npiggin@gmail.com> References: <20180828112034.30875-1-npiggin@gmail.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP fork clears dirty/accessed bits from new ptes in the child. This logic has existed since mapped page reclaim was done by scanning ptes when it may have been quite important. Today with physical based pte scanning, there is less reason to clear these bits. Dirty bits are all tested and cleared together and any dirty bit is the same as many dirty bits. Any young bit is treated similarly to many young bits, but not quite the same. A comment has been added where there is some difference. This eliminates a major source of faults powerpc/radix requires to set dirty/accessed bits in ptes, speeding up a fork/exit microbenchmark by about 5% on POWER9 (16600 -> 17500 fork/execs per second). Skylake appears to have a micro-fault overhead too -- a test which allocates 4GB anonymous memory, reads each page, then forks, and times the child reading a byte from each page. The first pass over the pages takes about 1000 cycles per page, the second pass takes about 27 cycles (TLB miss). With no additional minor faults measured due to either child pass, and the page array well exceeding TLB capacity, the large cost must be caused by micro faults caused by setting accessed bit. Signed-off-by: Nicholas Piggin --- mm/huge_memory.c | 2 -- mm/memory.c | 10 +++++----- mm/vmscan.c | 8 ++++++++ 3 files changed, 13 insertions(+), 7 deletions(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index d9bae12978ef..5fb1a43e12e0 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -977,7 +977,6 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm, pmdp_set_wrprotect(src_mm, addr, src_pmd); pmd = pmd_wrprotect(pmd); } - pmd = pmd_mkold(pmd); set_pmd_at(dst_mm, addr, dst_pmd, pmd); ret = 0; @@ -1071,7 +1070,6 @@ int copy_huge_pud(struct mm_struct *dst_mm, struct mm_struct *src_mm, pudp_set_wrprotect(src_mm, addr, src_pud); pud = pud_wrprotect(pud); } - pud = pud_mkold(pud); set_pud_at(dst_mm, addr, dst_pud, pud); ret = 0; diff --git a/mm/memory.c b/mm/memory.c index b616a69ad770..3d8bf8220bd0 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1038,12 +1038,12 @@ copy_one_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm, } /* - * If it's a shared mapping, mark it clean in - * the child + * Child inherits dirty and young bits from parent. There is no + * point clearing them because any cleaning or aging has to walk + * all ptes anyway, and it will notice the bits set in the parent. + * Leaving them set avoids stalls and even page faults on CPUs that + * handle these bits in software. */ - if (vm_flags & VM_SHARED) - pte = pte_mkclean(pte); - pte = pte_mkold(pte); page = vm_normal_page(vma, addr, pte); if (page) { diff --git a/mm/vmscan.c b/mm/vmscan.c index 7e7d25504651..52fe64af3d80 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1021,6 +1021,14 @@ static enum page_references page_check_references(struct page *page, * to look twice if a mapped file page is used more * than once. * + * fork() will set referenced bits in child ptes despite + * not having been accessed, to avoid micro-faults of + * setting accessed bits. This heuristic is not perfectly + * accurate in other ways -- multiple map/unmap in the + * same time window would be treated as multiple references + * despite same number of actual memory accesses made by + * the program. + * * Mark it and spare it for another trip around the * inactive list. Another page table reference will * lead to its activation.