From patchwork Thu May 24 00:58:49 2018
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: "Huang, Ying" <ying.huang@intel.com>
X-Patchwork-Id: 10422551
Return-Path: <owner-linux-mm@kvack.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
	[172.30.200.125])
	by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id
	8B0F16032C for <patchwork-linux-mm@patchwork.kernel.org>;
	Thu, 24 May 2018 01:00:37 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 79759292EA
	for <patchwork-linux-mm@patchwork.kernel.org>;
	Thu, 24 May 2018 01:00:37 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id 6DCCF292EC; Thu, 24 May 2018 01:00:37 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00, MAILING_LIST_MULTI,
	RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C132E292EA
	for <patchwork-linux-mm@patchwork.kernel.org>;
	Thu, 24 May 2018 01:00:36 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id BD3436B000E; Wed, 23 May 2018 21:00:34 -0400 (EDT)
Delivered-To: linux-mm-outgoing@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 40)
	id B84626B0266; Wed, 23 May 2018 21:00:34 -0400 (EDT)
X-Original-To: int-list-linux-mm@kvack.org
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 988866B0269; Wed, 23 May 2018 21:00:34 -0400 (EDT)
X-Original-To: linux-mm@kvack.org
X-Delivered-To: linux-mm@kvack.org
Received: from mail-pf0-f200.google.com (mail-pf0-f200.google.com
	[209.85.192.200])
	by kanga.kvack.org (Postfix) with ESMTP id 571956B0266
	for <linux-mm@kvack.org>; Wed, 23 May 2018 21:00:34 -0400 (EDT)
Received: by mail-pf0-f200.google.com with SMTP id l85-v6so13883846pfb.18
	for <linux-mm@kvack.org>; Wed, 23 May 2018 18:00:34 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=1e100.net; s=20161025;
	h=x-original-authentication-results:x-gm-message-state:from:to:cc
	:subject:date:message-id:in-reply-to:references;
	bh=EPCmsKNIqOQG1/Cdzz+J08gyz/HPvJnAD6TgVqQ8Q/4=;
	b=G8AY02fvQfltKC/DKTI3e2DqsluloKlWW01tCtbzC/TsD/2ygaKFE6TdWrECdJrHHc
	OG/kcq6s0EOE3OolV6a0ejG/mJ/Lr77d8s2JHVIjbOIWDDy1ufLa3lgHN0EH/V86f95W
	uCcvggqVTO/qosfB9JssvqCAxGBGju63gCCgg6k8dEVxZrHKcrvFFw2hMKMJxqC2hLvq
	XT4/DlMTVswd9OaP0oMAjX7WOw3bAU/3NfCJL/NkOMKwGAk3waOxDCyKXVszi1wUbj3N
	i/7l7rQyoFhv9M6tlD6zog0j2ybwXK3Ge7bC8mjn1kOeGg/JnJhiNaHDuxgrnPNJ4Iqn
	toBQ==
X-Original-Authentication-Results: mx.google.com;
	spf=pass (google.com: domain of
	ying.huang@intel.com designates 134.134.136.126 as permitted
	sender) smtp.mailfrom=ying.huang@intel.com;
	dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com
X-Gm-Message-State: ALKqPwfkDcoVNdXZtAwACrezyoUwgH0cqCrZsF+var3Zs3RbhVcMglmu
	kZHuGfCh12HJvCZq4KJSDez+YkZ21Fr5RTPZvfMowWVrHny6GZ9DgsIRgs8YnIVfyvDpOVnM3da
	A7aTY2bqHvx3yTUzCfgerdefTGyG8urkd6LXrtJqLyEt5/NVmJufAra+rjXyXx4G3eA==
X-Received: by 2002:a63:803:: with SMTP id
	3-v6mr4123299pgi.406.1527123634000;
	Wed, 23 May 2018 18:00:34 -0700 (PDT)
X-Google-Smtp-Source: 
 AB8JxZr77swYjL6zmMXB/xEI9AKekmuilRV8A8zoB7cNM+VzayNG438gRu1MtwrVdaRsqeGDXwlg
X-Received: by 2002:a63:803:: with SMTP id
	3-v6mr4123263pgi.406.1527123633255;
	Wed, 23 May 2018 18:00:33 -0700 (PDT)
ARC-Seal: i=1; a=rsa-sha256; t=1527123633; cv=none;
	d=google.com; s=arc-20160816;
	b=CEj6qwf1zSNKFn14tjIU7i4Srlia2F22SMXWf3nwQT4G0TN1WTQAbCLC0y6F0H/7hw
	eSsQO7r6W258V43IJPI7usFne/yyF5I5OLjcmOCJzd3cpcvC6gu0est5S0tzpNQzxCbH
	Wgnqt8rQzrgkQGChFUCblz8Mc4fey3UOJJbsTq/m5ICb1qj7LTNnBxlp2pO4vpRW7VyI
	Jxryuz8E4xBmSOHfr2bUno39CDMdIxDFUuwTX1N+o4S9M6YwDg5d7CBJcrmPkIbb4o3Y
	ZjruD6CIejjD554ol5hPsY6hijF3leSvNzPodA3lCLpQAliKRq4b2ls0KrEz1Lu3Z2xX
	Zm4w==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
	s=arc-20160816;
	h=references:in-reply-to:message-id:date:subject:cc:to:from
	:arc-authentication-results;
	bh=EPCmsKNIqOQG1/Cdzz+J08gyz/HPvJnAD6TgVqQ8Q/4=;
	b=tldeFXRRofLElnHSLCVT43zu/8opnWsBR6OdwXlhv0K1JCTt5OmE2Biog6gpFWVR7V
	KpqsHtR9sA5vAcO1EMAyzNQJ5TwkuzszNywoUq8rR2+0tGydrtkDAs7xSGJzsiFBZ4MQ
	8B8sgNi4Hmu2WF/5Qzg2aEQk/D/YgIvhUhr1dOfKM3Ly1IFPpfDmJl9SBRJ1UgxY6iIT
	E1cQ1MKtmKPkb67cuEPfNcPNApyJHfubzGtek6IJBVWM3TN/WEun8q+oEEzHox5Sv5U0
	cW/KUjY/D8vbFSHjkYtJnxtfEaHEzUB/eoyWJKA0UHMqdRTAKnVyxJrqVvmEz2e1g4a5
	97aw==
ARC-Authentication-Results: i=1; mx.google.com;
	spf=pass (google.com: domain of ying.huang@intel.com designates
	134.134.136.126 as permitted sender)
	smtp.mailfrom=ying.huang@intel.com;
	dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com
Received: from mga18.intel.com (mga18.intel.com. [134.134.136.126])
	by mx.google.com with ESMTPS id
	203-v6si19885295pfz.160.2018.05.23.18.00.33
	for <linux-mm@kvack.org>
	(version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
	Wed, 23 May 2018 18:00:33 -0700 (PDT)
Received-SPF: pass (google.com: domain of ying.huang@intel.com designates
	134.134.136.126 as permitted sender)
	client-ip=134.134.136.126;
Authentication-Results: mx.google.com;
	spf=pass (google.com: domain of ying.huang@intel.com designates
	134.134.136.126 as permitted sender)
	smtp.mailfrom=ying.huang@intel.com;
	dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from fmsmga008.fm.intel.com ([10.253.24.58])
	by orsmga106.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384;
	23 May 2018 18:00:32 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.49,435,1520924400"; d="scan'208";a="42362540"
Received: from yhuang6-ux31a.sh.intel.com ([10.239.197.97])
	by fmsmga008.fm.intel.com with ESMTP; 23 May 2018 18:00:30 -0700
From: "Huang, Ying" <ying.huang@intel.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Huang Ying <ying.huang@intel.com>, Andi Kleen <andi.kleen@intel.com>,
	Jan Kara <jack@suse.cz>, Michal Hocko <mhocko@suse.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	Matthew Wilcox <mawilcox@microsoft.com>, Hugh Dickins <hughd@google.com>,
	Minchan Kim <minchan@kernel.org>, Shaohua Li <shli@fb.com>,
	Christopher Lameter <cl@linux.com>,
	Mike Kravetz <mike.kravetz@oracle.com>
Subject: [PATCH -V2 -mm 2/4] mm,
	huge page: Copy target sub-page last when copy huge page
Date: Thu, 24 May 2018 08:58:49 +0800
Message-Id: <20180524005851.4079-3-ying.huang@intel.com>
X-Mailer: git-send-email 2.16.1
In-Reply-To: <20180524005851.4079-1-ying.huang@intel.com>
References: <20180524005851.4079-1-ying.huang@intel.com>
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>
X-Virus-Scanned: ClamAV using ClamSMTP

From: Huang Ying <ying.huang@intel.com>

Huge page helps to reduce TLB miss rate, but it has higher cache
footprint, sometimes this may cause some issue.  For example, when
copying huge page on x86_64 platform, the cache footprint is 4M.  But
on a Xeon E5 v3 2699 CPU, there are 18 cores, 36 threads, and only 45M
LLC (last level cache).  That is, in average, there are 2.5M LLC for
each core and 1.25M LLC for each thread.

If the cache contention is heavy when copying the huge page, and we
copy the huge page from the begin to the end, it is possible that the
begin of huge page is evicted from the cache after we finishing
copying the end of the huge page.  And it is possible for the
application to access the begin of the huge page after copying the
huge page.

In commit c79b57e462b5d ("mm: hugetlb: clear target sub-page last when
clearing huge page"), to keep the cache lines of the target subpage
hot, the order to clear the subpages in the huge page in
clear_huge_page() is changed to clearing the subpage which is furthest
from the target subpage firstly, and the target subpage last.  The
similar order changing helps huge page copying too.  That is
implemented in this patch.  Because we have put the order algorithm
into a separate function, the implementation is quite simple.

The patch is a generic optimization which should benefit quite some
workloads, not for a specific use case.  To demonstrate the performance
benefit of the patch, we tested it with vm-scalability run on
transparent huge page.

With this patch, the throughput increases ~16.6% in vm-scalability
anon-cow-seq test case with 36 processes on a 2 socket Xeon E5 v3 2699
system (36 cores, 72 threads).  The test case set
/sys/kernel/mm/transparent_hugepage/enabled to be always, mmap() a big
anonymous memory area and populate it, then forked 36 child processes,
each writes to the anonymous memory area from the begin to the end, so
cause copy on write.  For each child process, other child processes
could be seen as other workloads which generate heavy cache pressure.
At the same time, the IPC (instruction per cycle) increased from 0.63
to 0.78, and the time spent in user space is reduced ~7.2%.

Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Shaohua Li <shli@fb.com>
Cc: Christopher Lameter <cl@linux.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
---
 include/linux/mm.h |  3 ++-
 mm/huge_memory.c   |  3 ++-
 mm/memory.c        | 30 +++++++++++++++++++++++-------
 3 files changed, 27 insertions(+), 9 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 7cdd8b7f62e5..d227aadaa964 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2734,7 +2734,8 @@ extern void clear_huge_page(struct page *page,
 			    unsigned long addr_hint,
 			    unsigned int pages_per_huge_page);
 extern void copy_user_huge_page(struct page *dst, struct page *src,
-				unsigned long addr, struct vm_area_struct *vma,
+				unsigned long addr_hint,
+				struct vm_area_struct *vma,
 				unsigned int pages_per_huge_page);
 extern long copy_huge_page_from_user(struct page *dst_page,
 				const void __user *usr_src,
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index e9177363fe2e..1b7fd9bda1dc 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1328,7 +1328,8 @@ int do_huge_pmd_wp_page(struct vm_fault *vmf, pmd_t orig_pmd)
 	if (!page)
 		clear_huge_page(new_page, vmf->address, HPAGE_PMD_NR);
 	else
-		copy_user_huge_page(new_page, page, haddr, vma, HPAGE_PMD_NR);
+		copy_user_huge_page(new_page, page, vmf->address,
+				    vma, HPAGE_PMD_NR);
 	__SetPageUptodate(new_page);
 
 	mmun_start = haddr;
diff --git a/mm/memory.c b/mm/memory.c
index b9f573a81bbd..5d432f833d19 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4675,11 +4675,31 @@ static void copy_user_gigantic_page(struct page *dst, struct page *src,
 	}
 }
 
+struct copy_subpage_arg {
+	struct page *dst;
+	struct page *src;
+	struct vm_area_struct *vma;
+};
+
+static void copy_subpage(unsigned long addr, int idx, void *arg)
+{
+	struct copy_subpage_arg *copy_arg = arg;
+
+	copy_user_highpage(copy_arg->dst + idx, copy_arg->src + idx,
+			   addr, copy_arg->vma);
+}
+
 void copy_user_huge_page(struct page *dst, struct page *src,
-			 unsigned long addr, struct vm_area_struct *vma,
+			 unsigned long addr_hint, struct vm_area_struct *vma,
 			 unsigned int pages_per_huge_page)
 {
-	int i;
+	unsigned long addr = addr_hint &
+		~(((unsigned long)pages_per_huge_page << PAGE_SHIFT) - 1);
+	struct copy_subpage_arg arg = {
+		.dst = dst,
+		.src = src,
+		.vma = vma,
+	};
 
 	if (unlikely(pages_per_huge_page > MAX_ORDER_NR_PAGES)) {
 		copy_user_gigantic_page(dst, src, addr, vma,
@@ -4687,11 +4707,7 @@ void copy_user_huge_page(struct page *dst, struct page *src,
 		return;
 	}
 
-	might_sleep();
-	for (i = 0; i < pages_per_huge_page; i++) {
-		cond_resched();
-		copy_user_highpage(dst + i, src + i, addr + i*PAGE_SIZE, vma);
-	}
+	process_huge_page(addr_hint, pages_per_huge_page, copy_subpage, &arg);
 }
 
 long copy_huge_page_from_user(struct page *dst_page,