mm/shmem: fix up gfpmask for shmem hugepage allocation

Message ID	11e1ead211eb7d141efa0eb75a46ee2096ee63f8.1603267572.git.xuyu@linux.alibaba.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=inpt=D4=kvack.org=owner-linux-mm@kernel.org> DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 7FAB72227F From: Xu Yu <xuyu@linux.alibaba.com> To: linux-mm@kvack.org Cc: hughd@google.com, akpm@linux-foundation.org, linux-kernel@vger.kernel.org Subject: [PATCH] mm/shmem: fix up gfpmask for shmem hugepage allocation Date: Wed, 21 Oct 2020 16:09:39 +0800 Message-Id: <11e1ead211eb7d141efa0eb75a46ee2096ee63f8.1603267572.git.xuyu@linux.alibaba.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Sender: owner-linux-mm@kvack.org Precedence: bulk
Series	mm/shmem: fix up gfpmask for shmem hugepage allocation \| expand mm/shmem: fix up gfpmask for shmem hugepage allocation

Message ID

11e1ead211eb7d141efa0eb75a46ee2096ee63f8.1603267572.git.xuyu@linux.alibaba.com (mailing list archive)

State

New, archived

Headers

DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 7FAB72227F
From: Xu Yu <xuyu@linux.alibaba.com>
To: linux-mm@kvack.org
Cc: hughd@google.com,
	akpm@linux-foundation.org,
	linux-kernel@vger.kernel.org
Subject: [PATCH] mm/shmem: fix up gfpmask for shmem hugepage allocation
Date: Wed, 21 Oct 2020 16:09:39 +0800
Message-Id: 
 <11e1ead211eb7d141efa0eb75a46ee2096ee63f8.1603267572.git.xuyu@linux.alibaba.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Sender: owner-linux-mm@kvack.org
Precedence: bulk

Series

mm/shmem: fix up gfpmask for shmem hugepage allocation | expand

Commit Message

Xu Yu Oct. 21, 2020, 8:09 a.m. UTC

Currently, the gfpmask used in shmem_alloc_hugepage is fixed, i.e.,
gfp | __GFP_COMP | __GFP_NORETRY | __GFP_NOWARN, where gfp comes from
inode mapping, usually GFP_HIGHUSER_MOVABLE. This will introduce direct
or kswapd reclaim when fast path of shmem hugepage allocation fails,
which is unexpected sometimes.

This applies the effect of defrag option of anonymous hugepage to shmem
hugepage too. By doing so, we can control the defrag behavior of both
kinds of THP.

This also explicitly adds the SHMEM_HUGE_ALWAYS case in
shmem_getpage_gfp, for better code reading.

Signed-off-by: Xu Yu <xuyu@linux.alibaba.com>
---
 mm/shmem.c | 44 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 44 insertions(+)

Comments

Rik van Riel Oct. 22, 2020, 3:16 p.m. UTC | #1

On Wed, 2020-10-21 at 16:09 +0800, Xu Yu wrote:

> @@ -1887,6 +1930,7 @@ static int shmem_getpage_gfp(struct inode
> *inode, pgoff_t index,
>  	}
>  
>  alloc_huge:
> +	gfp = shmem_hugepage_gfpmask_fixup(gfp, sgp_huge);
>  	page = shmem_alloc_and_acct_page(gfp, inode, index, true);
>  	if (IS_ERR(page)) {
>  alloc_nohuge:

This looks it could be a bug, because the changed
gfp flags are also used for the non-huge allocation
below the alloc_nohuge: label, when the huge allocation
fails.

Using a separate huge_gfp variable would solve that
issue.

However, your patch also changes the meaning of
SHMEM_HUGE_FORCE from "override mount flags" to
"aggressively try reclaim and compaction", which
mixes up the equivalents of the anon THP sysctl
"enabled" and "defrag" settings.

I believe it makes sense to continue keeping the
"what should khugepaged do with these pages?" and
"how hard should we try at allocation time?" settings
separately for shmem the same way they are kept
separately for anonymous memory.

I also suspect it is simplest if shmem uses the
same "how hard should we try at allocation time?"
settings from the "defrag" sysfs file, instead
of giving system administrators two knobs that they
will likely want to set to the same value anyway.

Coincidentally, I have been looking at the same
code on and off for the past month, and also sent
a patch to the list to fix this issue yesterday.

I suspect my patch can be simplified a little more
by directly using alloc_hugepage_direct_gfpmask to
create a huge_gfp flag in shmem_getpage_gfp.

https://lore.kernel.org/linux-mm/20201021234846.5cc97e62@imladris.surriel.com/

diff --git a/mm/shmem.c b/mm/shmem.c
index 537c137698f8..a0f5d02e479b 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1780,6 +1780,47 @@  static int shmem_swapin_page(struct inode *inode, pgoff_t index,
 	return error;
 }
 
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+static inline gfp_t shmem_hugepage_gfpmask_fixup(gfp_t gfp,
+						 enum sgp_type sgp_huge)
+{
+	const bool vma_madvised = sgp_huge == SGP_HUGE;
+
+	gfp |= __GFP_NOMEMALLOC;
+	gfp &= ~__GFP_RECLAIM;
+
+	/* Force do synchronous compaction */
+	if (shmem_huge == SHMEM_HUGE_FORCE)
+		return gfp | __GFP_DIRECT_RECLAIM;
+
+	/* Always do synchronous compaction */
+	if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_DIRECT_FLAG, &transparent_hugepage_flags))
+		return gfp | __GFP_DIRECT_RECLAIM | (vma_madvised ? 0 : __GFP_NORETRY);
+
+	/* Kick kcompactd and fail quickly */
+	if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_FLAG, &transparent_hugepage_flags))
+		return gfp | __GFP_KSWAPD_RECLAIM;
+
+	/* Synchronous compaction if madvised, otherwise kick kcompactd */
+	if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_OR_MADV_FLAG, &transparent_hugepage_flags))
+		return gfp |
+			(vma_madvised ? __GFP_DIRECT_RECLAIM :
+					__GFP_KSWAPD_RECLAIM);
+
+	/* Only do synchronous compaction if madvised */
+	if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_REQ_MADV_FLAG, &transparent_hugepage_flags))
+		return gfp | (vma_madvised ? __GFP_DIRECT_RECLAIM : 0);
+
+	return gfp;
+}
+#else
+static inline gfp_t shmem_hugepage_gfpmask_fixup(gfp_t gfp,
+						 enum sgp_type sgp_huge)
+{
+	return gfp;
+}
+#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
+
 /*
  * shmem_getpage_gfp - find page in cache, or get from swap, or allocate
  *
@@ -1867,6 +1908,8 @@  static int shmem_getpage_gfp(struct inode *inode, pgoff_t index,
 	switch (sbinfo->huge) {
 	case SHMEM_HUGE_NEVER:
 		goto alloc_nohuge;
+	case SHMEM_HUGE_ALWAYS:
+		goto alloc_huge;
 	case SHMEM_HUGE_WITHIN_SIZE: {
 		loff_t i_size;
 		pgoff_t off;
@@ -1887,6 +1930,7 @@  static int shmem_getpage_gfp(struct inode *inode, pgoff_t index,
 	}
 
 alloc_huge:
+	gfp = shmem_hugepage_gfpmask_fixup(gfp, sgp_huge);
 	page = shmem_alloc_and_acct_page(gfp, inode, index, true);
 	if (IS_ERR(page)) {
 alloc_nohuge:

mm/shmem: fix up gfpmask for shmem hugepage allocation

Commit Message

Comments

Patch