diff mbox series

[v7,3/5] mm/hugetlb: add support for mempolicy MPOL_PREFERRED_MANY

Message ID 1627970362-61305-4-git-send-email-feng.tang@intel.com (mailing list archive)
State New
Headers show
Series Introduce multi-preference mempolicy | expand

Commit Message

Feng Tang Aug. 3, 2021, 5:59 a.m. UTC
From: Ben Widawsky <ben.widawsky@intel.com>

Implement the missing huge page allocation functionality while obeying
the preferred node semantics. This is similar to the implementation
for general page allocation, as it uses a fallback mechanism to try
multiple preferred nodes first, and then all other nodes.

[akpm: fix compling issue when merging with other hugetlb patch]
[Thanks to 0day bot for catching the missing #ifdef CONFIG_NUMA issue]
Link: https://lore.kernel.org/r/20200630212517.308045-12-ben.widawsky@intel.com
Suggested-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
Co-developed-by: Feng Tang <feng.tang@intel.com>
Signed-off-by: Feng Tang <feng.tang@intel.com>
---
 mm/hugetlb.c | 28 ++++++++++++++++++++++++++++
 1 file changed, 28 insertions(+)

Comments

Michal Hocko Aug. 6, 2021, 1:35 p.m. UTC | #1
On Tue 03-08-21 13:59:20, Feng Tang wrote:
> From: Ben Widawsky <ben.widawsky@intel.com>
> 
> Implement the missing huge page allocation functionality while obeying
> the preferred node semantics. This is similar to the implementation
> for general page allocation, as it uses a fallback mechanism to try
> multiple preferred nodes first, and then all other nodes.
> 
> [akpm: fix compling issue when merging with other hugetlb patch]
> [Thanks to 0day bot for catching the missing #ifdef CONFIG_NUMA issue]
> Link: https://lore.kernel.org/r/20200630212517.308045-12-ben.widawsky@intel.com
> Suggested-by: Michal Hocko <mhocko@suse.com>
> Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
> Co-developed-by: Feng Tang <feng.tang@intel.com>
> Signed-off-by: Feng Tang <feng.tang@intel.com>

ifdefery is just ugly as hell. One way to get rid of that would be to
provide a mpol_is_preferred_many() wrapper and hide the CONFIG_NUMA in
mempolicy.h. I haven't checked but this might help to remove some other
ifdefery as well.

I especially dislike the label hidden in the ifdef. You can get rid of
that by checking the page for NULL.

> ---
>  mm/hugetlb.c | 28 ++++++++++++++++++++++++++++
>  1 file changed, 28 insertions(+)
> 
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 95714fb28150..9279f6d478d9 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -1166,7 +1166,20 @@ static struct page *dequeue_huge_page_vma(struct hstate *h,
>  
>  	gfp_mask = htlb_alloc_mask(h);
>  	nid = huge_node(vma, address, gfp_mask, &mpol, &nodemask);
> +#ifdef CONFIG_NUMA
> +	if (mpol->mode == MPOL_PREFERRED_MANY) {
> +		page = dequeue_huge_page_nodemask(h, gfp_mask, nid, nodemask);
> +		if (page)
> +			goto check_reserve;
> +		/* Fallback to all nodes */
> +		nodemask = NULL;
> +	}
> +#endif
>  	page = dequeue_huge_page_nodemask(h, gfp_mask, nid, nodemask);
> +
> +#ifdef CONFIG_NUMA
> +check_reserve:
> +#endif
>  	if (page && !avoid_reserve && vma_has_reserves(vma, chg)) {
>  		SetHPageRestoreReserve(page);
>  		h->resv_huge_pages--;
> @@ -2147,6 +2160,21 @@ struct page *alloc_buddy_huge_page_with_mpol(struct hstate *h,
>  	nodemask_t *nodemask;
>  
>  	nid = huge_node(vma, addr, gfp_mask, &mpol, &nodemask);
> +#ifdef CONFIG_NUMA
> +	if (mpol->mode == MPOL_PREFERRED_MANY) {
> +		gfp_t gfp = gfp_mask | __GFP_NOWARN;
> +
> +		gfp &=  ~(__GFP_DIRECT_RECLAIM | __GFP_NOFAIL);
> +		page = alloc_surplus_huge_page(h, gfp, nid, nodemask, false);
> +		if (page) {
> +			mpol_cond_put(mpol);
> +			return page;
> +		}
> +
> +		/* Fallback to all nodes */
> +		nodemask = NULL;
> +	}
> +#endif
>  	page = alloc_surplus_huge_page(h, gfp_mask, nid, nodemask, false);
>  	mpol_cond_put(mpol);
>  
> -- 
> 2.14.1
Feng Tang Aug. 9, 2021, 2:44 a.m. UTC | #2
Hi Michal,

Thanks for the review and ACKs to 1/5 and 2/5 patches.

On Fri, Aug 06, 2021 at 03:35:48PM +0200, Michal Hocko wrote:
> On Tue 03-08-21 13:59:20, Feng Tang wrote:
> > From: Ben Widawsky <ben.widawsky@intel.com>
> > 
> > Implement the missing huge page allocation functionality while obeying
> > the preferred node semantics. This is similar to the implementation
> > for general page allocation, as it uses a fallback mechanism to try
> > multiple preferred nodes first, and then all other nodes.
> > 
> > [akpm: fix compling issue when merging with other hugetlb patch]
> > [Thanks to 0day bot for catching the missing #ifdef CONFIG_NUMA issue]
> > Link: https://lore.kernel.org/r/20200630212517.308045-12-ben.widawsky@intel.com
> > Suggested-by: Michal Hocko <mhocko@suse.com>
> > Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
> > Co-developed-by: Feng Tang <feng.tang@intel.com>
> > Signed-off-by: Feng Tang <feng.tang@intel.com>
> 
> ifdefery is just ugly as hell. One way to get rid of that would be to
> provide a mpol_is_preferred_many() wrapper and hide the CONFIG_NUMA in
> mempolicy.h. I haven't checked but this might help to remove some other
> ifdefery as well.
> 
> I especially dislike the label hidden in the ifdef. You can get rid of
> that by checking the page for NULL.

Yes, the 'ifdef's were annoying to me too, and thanks for the suggestions.
Following is the revised patch upon the suggestion.

Thanks,
Feng

-------8<---------------------

From fc30718c40f02ba5ea73456af49173e66b5032c1 Mon Sep 17 00:00:00 2001
From: Ben Widawsky <ben.widawsky@intel.com>
Date: Thu, 5 Aug 2021 23:01:11 -0400
Subject: [PATCH] mm/hugetlb: add support for mempolicy MPOL_PREFERRED_MANY

Implement the missing huge page allocation functionality while obeying the
preferred node semantics.  This is similar to the implementation for
general page allocation, as it uses a fallback mechanism to try multiple
preferred nodes first, and then all other nodes. 

To avoid adding too many "#ifdef CONFIG_NUMA" check, add a helper function
in mempolicy.h to check whether a mempolicy is MPOL_PREFERRED_MANY.

[akpm: fix compling issue when merging with other hugetlb patch]
[Thanks to 0day bot for catching the !CONFIG_NUMA compiling issue]
[Michal Hocko: suggest to remove the #ifdef CONFIG_NUMA check]
Link: https://lore.kernel.org/r/20200630212517.308045-12-ben.widawsky@intel.com
Link: https://lkml.kernel.org/r/1627970362-61305-4-git-send-email-feng.tang@intel.com
Suggested-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
Co-developed-by: Feng Tang <feng.tang@intel.com>
Signed-off-by: Feng Tang <feng.tang@intel.com>
--
 include/linux/mempolicy.h | 12 ++++++++++++
 mm/hugetlb.c              | 28 ++++++++++++++++++++++++----
 2 files changed, 36 insertions(+), 4 deletions(-)

diff --git a/include/linux/mempolicy.h b/include/linux/mempolicy.h
index 0117e1e..60d5e6c 100644
--- a/include/linux/mempolicy.h
+++ b/include/linux/mempolicy.h
@@ -187,6 +187,12 @@ extern void mpol_put_task_policy(struct task_struct *);
 
 extern bool numa_demotion_enabled;
 
+static inline bool mpol_is_preferred_many(struct mempolicy *pol)
+{
+	return  (pol->mode == MPOL_PREFERRED_MANY);
+}
+
+
 #else
 
 struct mempolicy {};
@@ -297,5 +303,11 @@ static inline nodemask_t *policy_nodemask_current(gfp_t gfp)
 }
 
 #define numa_demotion_enabled	false
+
+static inline bool mpol_is_preferred_many(struct mempolicy *pol)
+{
+	return  false;
+}
+
 #endif /* CONFIG_NUMA */
 #endif
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 95714fb..75ea8bc 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1145,7 +1145,7 @@ static struct page *dequeue_huge_page_vma(struct hstate *h,
 				unsigned long address, int avoid_reserve,
 				long chg)
 {
-	struct page *page;
+	struct page *page = NULL;
 	struct mempolicy *mpol;
 	gfp_t gfp_mask;
 	nodemask_t *nodemask;
@@ -1166,7 +1166,17 @@ static struct page *dequeue_huge_page_vma(struct hstate *h,
 
 	gfp_mask = htlb_alloc_mask(h);
 	nid = huge_node(vma, address, gfp_mask, &mpol, &nodemask);
-	page = dequeue_huge_page_nodemask(h, gfp_mask, nid, nodemask);
+
+	if (mpol_is_preferred_many(mpol)) {
+		page = dequeue_huge_page_nodemask(h, gfp_mask, nid, nodemask);
+
+		/* Fallback to all nodes if page==NULL */
+		nodemask = NULL;
+	}
+
+	if (!page)
+		page = dequeue_huge_page_nodemask(h, gfp_mask, nid, nodemask);
+
 	if (page && !avoid_reserve && vma_has_reserves(vma, chg)) {
 		SetHPageRestoreReserve(page);
 		h->resv_huge_pages--;
@@ -2147,9 +2157,19 @@ struct page *alloc_buddy_huge_page_with_mpol(struct hstate *h,
 	nodemask_t *nodemask;
 
 	nid = huge_node(vma, addr, gfp_mask, &mpol, &nodemask);
-	page = alloc_surplus_huge_page(h, gfp_mask, nid, nodemask, false);
-	mpol_cond_put(mpol);
+	if (mpol_is_preferred_many(mpol)) {
+		gfp_t gfp = gfp_mask | __GFP_NOWARN;
 
+		gfp &=  ~(__GFP_DIRECT_RECLAIM | __GFP_NOFAIL);
+		page = alloc_surplus_huge_page(h, gfp, nid, nodemask, false);
+
+		/* Fallback to all nodes if page==NULL */
+		nodemask = NULL;
+	}
+
+	if (!page)
+		page = alloc_surplus_huge_page(h, gfp_mask, nid, nodemask, false);
+	mpol_cond_put(mpol);
 	return page;
 }
Michal Hocko Aug. 9, 2021, 8:41 a.m. UTC | #3
On Mon 09-08-21 10:44:30, Feng Tang wrote:
> Hi Michal,
> 
> Thanks for the review and ACKs to 1/5 and 2/5 patches.
> 
> On Fri, Aug 06, 2021 at 03:35:48PM +0200, Michal Hocko wrote:
> > On Tue 03-08-21 13:59:20, Feng Tang wrote:
> > > From: Ben Widawsky <ben.widawsky@intel.com>
> > > 
> > > Implement the missing huge page allocation functionality while obeying
> > > the preferred node semantics. This is similar to the implementation
> > > for general page allocation, as it uses a fallback mechanism to try
> > > multiple preferred nodes first, and then all other nodes.
> > > 
> > > [akpm: fix compling issue when merging with other hugetlb patch]
> > > [Thanks to 0day bot for catching the missing #ifdef CONFIG_NUMA issue]
> > > Link: https://lore.kernel.org/r/20200630212517.308045-12-ben.widawsky@intel.com
> > > Suggested-by: Michal Hocko <mhocko@suse.com>
> > > Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
> > > Co-developed-by: Feng Tang <feng.tang@intel.com>
> > > Signed-off-by: Feng Tang <feng.tang@intel.com>
> > 
> > ifdefery is just ugly as hell. One way to get rid of that would be to
> > provide a mpol_is_preferred_many() wrapper and hide the CONFIG_NUMA in
> > mempolicy.h. I haven't checked but this might help to remove some other
> > ifdefery as well.
> > 
> > I especially dislike the label hidden in the ifdef. You can get rid of
> > that by checking the page for NULL.
> 
> Yes, the 'ifdef's were annoying to me too, and thanks for the suggestions.
> Following is the revised patch upon the suggestion.
> 
> Thanks,
> Feng
> 
> -------8<---------------------
> 
> >From fc30718c40f02ba5ea73456af49173e66b5032c1 Mon Sep 17 00:00:00 2001
> From: Ben Widawsky <ben.widawsky@intel.com>
> Date: Thu, 5 Aug 2021 23:01:11 -0400
> Subject: [PATCH] mm/hugetlb: add support for mempolicy MPOL_PREFERRED_MANY
> 
> Implement the missing huge page allocation functionality while obeying the
> preferred node semantics.  This is similar to the implementation for
> general page allocation, as it uses a fallback mechanism to try multiple
> preferred nodes first, and then all other nodes. 
> 
> To avoid adding too many "#ifdef CONFIG_NUMA" check, add a helper function
> in mempolicy.h to check whether a mempolicy is MPOL_PREFERRED_MANY.
> 
> [akpm: fix compling issue when merging with other hugetlb patch]
> [Thanks to 0day bot for catching the !CONFIG_NUMA compiling issue]
> [Michal Hocko: suggest to remove the #ifdef CONFIG_NUMA check]
> Link: https://lore.kernel.org/r/20200630212517.308045-12-ben.widawsky@intel.com
> Link: https://lkml.kernel.org/r/1627970362-61305-4-git-send-email-feng.tang@intel.com
> Suggested-by: Michal Hocko <mhocko@suse.com>
> Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
> Co-developed-by: Feng Tang <feng.tang@intel.com>
> Signed-off-by: Feng Tang <feng.tang@intel.com>

Yeah. This looks much better. Thanks!
Acked-by: Michal Hocko <mhocko@suse.com>
Do you think you can provide same helpers for other policies as well?
Maybe we can get rid of some other ifdefery as well.

Thanks!
Feng Tang Aug. 9, 2021, 12:37 p.m. UTC | #4
On Mon, Aug 09, 2021 at 10:41:40AM +0200, Michal Hocko wrote:
[snip]
> > >From fc30718c40f02ba5ea73456af49173e66b5032c1 Mon Sep 17 00:00:00 2001
> > From: Ben Widawsky <ben.widawsky@intel.com>
> > Date: Thu, 5 Aug 2021 23:01:11 -0400
> > Subject: [PATCH] mm/hugetlb: add support for mempolicy MPOL_PREFERRED_MANY
> > 
> > Implement the missing huge page allocation functionality while obeying the
> > preferred node semantics.  This is similar to the implementation for
> > general page allocation, as it uses a fallback mechanism to try multiple
> > preferred nodes first, and then all other nodes. 
> > 
> > To avoid adding too many "#ifdef CONFIG_NUMA" check, add a helper function
> > in mempolicy.h to check whether a mempolicy is MPOL_PREFERRED_MANY.
> > 
> > [akpm: fix compling issue when merging with other hugetlb patch]
> > [Thanks to 0day bot for catching the !CONFIG_NUMA compiling issue]
> > [Michal Hocko: suggest to remove the #ifdef CONFIG_NUMA check]
> > Link: https://lore.kernel.org/r/20200630212517.308045-12-ben.widawsky@intel.com
> > Link: https://lkml.kernel.org/r/1627970362-61305-4-git-send-email-feng.tang@intel.com
> > Suggested-by: Michal Hocko <mhocko@suse.com>
> > Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
> > Co-developed-by: Feng Tang <feng.tang@intel.com>
> > Signed-off-by: Feng Tang <feng.tang@intel.com>
> 
> Yeah. This looks much better. Thanks!
> Acked-by: Michal Hocko <mhocko@suse.com>

Thank you!

> Do you think you can provide same helpers for other policies as well?
> Maybe we can get rid of some other ifdefery as well.

Sure. I can make separate patch(es) for that.

And you mean helper like mpol_is_bind/default/local/preferred? 

I just run 'git-grep MPOL', and for places using "mode == MPOL_XXX",
mostly they are in mempolicy.[ch], the only another place is in
shmem.c, do we need to create all the helpers for it and the
potential future users? 

Thanks,
Feng
Michal Hocko Aug. 9, 2021, 1:19 p.m. UTC | #5
On Mon 09-08-21 20:37:47, Feng Tang wrote:
> On Mon, Aug 09, 2021 at 10:41:40AM +0200, Michal Hocko wrote:
> [snip]
> > > >From fc30718c40f02ba5ea73456af49173e66b5032c1 Mon Sep 17 00:00:00 2001
> > > From: Ben Widawsky <ben.widawsky@intel.com>
> > > Date: Thu, 5 Aug 2021 23:01:11 -0400
> > > Subject: [PATCH] mm/hugetlb: add support for mempolicy MPOL_PREFERRED_MANY
> > > 
> > > Implement the missing huge page allocation functionality while obeying the
> > > preferred node semantics.  This is similar to the implementation for
> > > general page allocation, as it uses a fallback mechanism to try multiple
> > > preferred nodes first, and then all other nodes. 
> > > 
> > > To avoid adding too many "#ifdef CONFIG_NUMA" check, add a helper function
> > > in mempolicy.h to check whether a mempolicy is MPOL_PREFERRED_MANY.
> > > 
> > > [akpm: fix compling issue when merging with other hugetlb patch]
> > > [Thanks to 0day bot for catching the !CONFIG_NUMA compiling issue]
> > > [Michal Hocko: suggest to remove the #ifdef CONFIG_NUMA check]
> > > Link: https://lore.kernel.org/r/20200630212517.308045-12-ben.widawsky@intel.com
> > > Link: https://lkml.kernel.org/r/1627970362-61305-4-git-send-email-feng.tang@intel.com
> > > Suggested-by: Michal Hocko <mhocko@suse.com>
> > > Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
> > > Co-developed-by: Feng Tang <feng.tang@intel.com>
> > > Signed-off-by: Feng Tang <feng.tang@intel.com>
> > 
> > Yeah. This looks much better. Thanks!
> > Acked-by: Michal Hocko <mhocko@suse.com>
> 
> Thank you!
> 
> > Do you think you can provide same helpers for other policies as well?
> > Maybe we can get rid of some other ifdefery as well.
> 
> Sure. I can make separate patch(es) for that.
> 
> And you mean helper like mpol_is_bind/default/local/preferred? 
> 
> I just run 'git-grep MPOL', and for places using "mode == MPOL_XXX",
> mostly they are in mempolicy.[ch], the only another place is in
> shmem.c, do we need to create all the helpers for it and the
> potential future users? 

I would just go with those instances which need to ifdef for NUMA.
Thanks!
Feng Tang Aug. 10, 2021, 8:50 a.m. UTC | #6
On Mon, Aug 09, 2021 at 03:19:32PM +0200, Michal Hocko wrote:
[snip]
> > > Do you think you can provide same helpers for other policies as well?
> > > Maybe we can get rid of some other ifdefery as well.
> > 
> > Sure. I can make separate patch(es) for that.
> > 
> > And you mean helper like mpol_is_bind/default/local/preferred? 
> > 
> > I just run 'git-grep MPOL', and for places using "mode == MPOL_XXX",
> > mostly they are in mempolicy.[ch], the only another place is in
> > shmem.c, do we need to create all the helpers for it and the
> > potential future users? 
> 
> I would just go with those instances which need to ifdef for NUMA.
> Thanks!

Yes, following is a patch to remove one CONFIG_NUMA check, though
an bolder idea to extend the patch by removing the CONFIG_TMPFS
check in the same line.

Thanks,
Feng

---------8<---------------------------------

From 1a5858721ac8ce99c27c13d310bba2983dc73d97 Mon Sep 17 00:00:00 2001
From: Feng Tang <feng.tang@intel.com>
Date: Tue, 10 Aug 2021 17:00:59 +0800
Subject: [PATCH] mm: shmem: avoid open coded check for mempolicy's mode

Add a mempolicy helper to do the check, which can also remove
a CONFIG_NUMA option check.

Suggested-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Feng Tang <feng.tang@intel.com>
---
 include/linux/mempolicy.h | 14 ++++++++++++++
 mm/shmem.c                |  8 ++++----
 2 files changed, 18 insertions(+), 4 deletions(-)

diff --git a/include/linux/mempolicy.h b/include/linux/mempolicy.h
index 60d5e6c3340c..8fc518ad4f3c 100644
--- a/include/linux/mempolicy.h
+++ b/include/linux/mempolicy.h
@@ -192,6 +192,10 @@ static inline bool mpol_is_preferred_many(struct mempolicy *pol)
 	return  (pol->mode == MPOL_PREFERRED_MANY);
 }
 
+static inline bool mpol_is_default(struct mempolicy *pol)
+{
+	return  (pol->mode == MPOL_DEFAULT);
+}
 
 #else
 
@@ -287,6 +291,10 @@ static inline int mpol_parse_str(char *str, struct mempolicy **mpol)
 }
 #endif
 
+static inline void mpol_to_str(char *buffer, int maxlen, struct mempolicy *pol)
+{
+}
+
 static inline int mpol_misplaced(struct page *page, struct vm_area_struct *vma,
 				 unsigned long address)
 {
@@ -309,5 +317,11 @@ static inline bool mpol_is_preferred_many(struct mempolicy *pol)
 	return  false;
 }
 
+static inline bool mpol_is_default(struct mempolicy *pol)
+{
+	return  false;
+}
+
+
 #endif /* CONFIG_NUMA */
 #endif
diff --git a/mm/shmem.c b/mm/shmem.c
index 96f05f6af8bb..26b195209ef7 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1437,12 +1437,12 @@ static int shmem_writepage(struct page *page, struct writeback_control *wbc)
 	return 0;
 }
 
-#if defined(CONFIG_NUMA) && defined(CONFIG_TMPFS)
+#ifdef CONFIG_TMPFS
 static void shmem_show_mpol(struct seq_file *seq, struct mempolicy *mpol)
 {
 	char buffer[64];
 
-	if (!mpol || mpol->mode == MPOL_DEFAULT)
+	if (!mpol || mpol_is_default(mpol))
 		return;		/* show nothing */
 
 	mpol_to_str(buffer, sizeof(buffer), mpol);
@@ -1461,7 +1461,7 @@ static struct mempolicy *shmem_get_sbmpol(struct shmem_sb_info *sbinfo)
 	}
 	return mpol;
 }
-#else /* !CONFIG_NUMA || !CONFIG_TMPFS */
+#else /* !CONFIG_TMPFS */
 static inline void shmem_show_mpol(struct seq_file *seq, struct mempolicy *mpol)
 {
 }
@@ -1469,7 +1469,7 @@ static inline struct mempolicy *shmem_get_sbmpol(struct shmem_sb_info *sbinfo)
 {
 	return NULL;
 }
-#endif /* CONFIG_NUMA && CONFIG_TMPFS */
+#endif /* CONFIG_TMPFS */
 #ifndef CONFIG_NUMA
 #define vm_policy vm_private_data
 #endif
Hugh Dickins Aug. 10, 2021, 9:35 p.m. UTC | #7
On Tue, 10 Aug 2021, Feng Tang wrote:
> On Mon, Aug 09, 2021 at 03:19:32PM +0200, Michal Hocko wrote:
> [snip]
> > > > Do you think you can provide same helpers for other policies as well?
> > > > Maybe we can get rid of some other ifdefery as well.
> > > 
> > > Sure. I can make separate patch(es) for that.
> > > 
> > > And you mean helper like mpol_is_bind/default/local/preferred? 
> > > 
> > > I just run 'git-grep MPOL', and for places using "mode == MPOL_XXX",
> > > mostly they are in mempolicy.[ch], the only another place is in
> > > shmem.c, do we need to create all the helpers for it and the
> > > potential future users? 
> > 
> > I would just go with those instances which need to ifdef for NUMA.
> > Thanks!
> 
> Yes, following is a patch to remove one CONFIG_NUMA check, though
> an bolder idea to extend the patch by removing the CONFIG_TMPFS
> check in the same line.
> 
> Thanks,
> Feng
> 
> ---------8<---------------------------------
> 
> From 1a5858721ac8ce99c27c13d310bba2983dc73d97 Mon Sep 17 00:00:00 2001
> From: Feng Tang <feng.tang@intel.com>
> Date: Tue, 10 Aug 2021 17:00:59 +0800
> Subject: [PATCH] mm: shmem: avoid open coded check for mempolicy's mode
> 
> Add a mempolicy helper to do the check, which can also remove
> a CONFIG_NUMA option check.
> 
> Suggested-by: Michal Hocko <mhocko@suse.com>
> Signed-off-by: Feng Tang <feng.tang@intel.com>

No thanks: this is not an improvement.

The "#if defined(CONFIG_NUMA) && defined(CONFIG_TMPFS)" is there to
eliminate dead code that would not be automatically eliminated by the
optimizer: it's not there just to avoid MPOL_DEFAULT, and it's there
to cover shmem_get_sbmpol() along with shmem_show_mpol().

I know we tend to avoid #ifdefs in .c files, and that's good; and
I know you could find other code in mm/shmem.c which might also be
#ifdef'ed to eliminate other dead code in other configs; but unless
there's a new drive to purge our .c source of all #ifdefs,
please just leave this as is.

Thanks,
Hugh

> ---
>  include/linux/mempolicy.h | 14 ++++++++++++++
>  mm/shmem.c                |  8 ++++----
>  2 files changed, 18 insertions(+), 4 deletions(-)
> 
> diff --git a/include/linux/mempolicy.h b/include/linux/mempolicy.h
> index 60d5e6c3340c..8fc518ad4f3c 100644
> --- a/include/linux/mempolicy.h
> +++ b/include/linux/mempolicy.h
> @@ -192,6 +192,10 @@ static inline bool mpol_is_preferred_many(struct mempolicy *pol)
>  	return  (pol->mode == MPOL_PREFERRED_MANY);
>  }
>  
> +static inline bool mpol_is_default(struct mempolicy *pol)
> +{
> +	return  (pol->mode == MPOL_DEFAULT);
> +}
>  
>  #else
>  
> @@ -287,6 +291,10 @@ static inline int mpol_parse_str(char *str, struct mempolicy **mpol)
>  }
>  #endif
>  
> +static inline void mpol_to_str(char *buffer, int maxlen, struct mempolicy *pol)
> +{
> +}
> +
>  static inline int mpol_misplaced(struct page *page, struct vm_area_struct *vma,
>  				 unsigned long address)
>  {
> @@ -309,5 +317,11 @@ static inline bool mpol_is_preferred_many(struct mempolicy *pol)
>  	return  false;
>  }
>  
> +static inline bool mpol_is_default(struct mempolicy *pol)
> +{
> +	return  false;
> +}
> +
> +
>  #endif /* CONFIG_NUMA */
>  #endif
> diff --git a/mm/shmem.c b/mm/shmem.c
> index 96f05f6af8bb..26b195209ef7 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -1437,12 +1437,12 @@ static int shmem_writepage(struct page *page, struct writeback_control *wbc)
>  	return 0;
>  }
>  
> -#if defined(CONFIG_NUMA) && defined(CONFIG_TMPFS)
> +#ifdef CONFIG_TMPFS
>  static void shmem_show_mpol(struct seq_file *seq, struct mempolicy *mpol)
>  {
>  	char buffer[64];
>  
> -	if (!mpol || mpol->mode == MPOL_DEFAULT)
> +	if (!mpol || mpol_is_default(mpol))
>  		return;		/* show nothing */
>  
>  	mpol_to_str(buffer, sizeof(buffer), mpol);
> @@ -1461,7 +1461,7 @@ static struct mempolicy *shmem_get_sbmpol(struct shmem_sb_info *sbinfo)
>  	}
>  	return mpol;
>  }
> -#else /* !CONFIG_NUMA || !CONFIG_TMPFS */
> +#else /* !CONFIG_TMPFS */
>  static inline void shmem_show_mpol(struct seq_file *seq, struct mempolicy *mpol)
>  {
>  }
> @@ -1469,7 +1469,7 @@ static inline struct mempolicy *shmem_get_sbmpol(struct shmem_sb_info *sbinfo)
>  {
>  	return NULL;
>  }
> -#endif /* CONFIG_NUMA && CONFIG_TMPFS */
> +#endif /* CONFIG_TMPFS */
>  #ifndef CONFIG_NUMA
>  #define vm_policy vm_private_data
>  #endif
> -- 
> 2.14.1
Feng Tang Aug. 11, 2021, 1:37 a.m. UTC | #8
Hi Huge,

On Tue, Aug 10, 2021 at 02:35:05PM -0700, Hugh Dickins wrote:
> On Tue, 10 Aug 2021, Feng Tang wrote:
> > On Mon, Aug 09, 2021 at 03:19:32PM +0200, Michal Hocko wrote:
> > [snip]
> > > > > Do you think you can provide same helpers for other policies as well?
> > > > > Maybe we can get rid of some other ifdefery as well.
> > > > 
> > > > Sure. I can make separate patch(es) for that.
> > > > 
> > > > And you mean helper like mpol_is_bind/default/local/preferred? 
> > > > 
> > > > I just run 'git-grep MPOL', and for places using "mode == MPOL_XXX",
> > > > mostly they are in mempolicy.[ch], the only another place is in
> > > > shmem.c, do we need to create all the helpers for it and the
> > > > potential future users? 
> > > 
> > > I would just go with those instances which need to ifdef for NUMA.
> > > Thanks!
> > 
> > Yes, following is a patch to remove one CONFIG_NUMA check, though
> > an bolder idea to extend the patch by removing the CONFIG_TMPFS
> > check in the same line.
> > 
> > Thanks,
> > Feng
> > 
> > ---------8<---------------------------------
> > 
> > From 1a5858721ac8ce99c27c13d310bba2983dc73d97 Mon Sep 17 00:00:00 2001
> > From: Feng Tang <feng.tang@intel.com>
> > Date: Tue, 10 Aug 2021 17:00:59 +0800
> > Subject: [PATCH] mm: shmem: avoid open coded check for mempolicy's mode
> > 
> > Add a mempolicy helper to do the check, which can also remove
> > a CONFIG_NUMA option check.
> > 
> > Suggested-by: Michal Hocko <mhocko@suse.com>
> > Signed-off-by: Feng Tang <feng.tang@intel.com>
> 
> No thanks: this is not an improvement.
> 
> The "#if defined(CONFIG_NUMA) && defined(CONFIG_TMPFS)" is there to
> eliminate dead code that would not be automatically eliminated by the
> optimizer: it's not there just to avoid MPOL_DEFAULT, and it's there
> to cover shmem_get_sbmpol() along with shmem_show_mpol().

Thanks for the explaination! I did some tests that in !NUMA case, the
'sbinfo->mpol' is always NULL (I could be wrong) which makes the
2 functions almost non-ops.

> I know we tend to avoid #ifdefs in .c files, and that's good; and
> I know you could find other code in mm/shmem.c which might also be
> #ifdef'ed to eliminate other dead code in other configs; but unless
> there's a new drive to purge our .c source of all #ifdefs,
> please just leave this as is.
 
Ok, and sorry for the noise.

Thanks,
Feng

> Thanks,
> Hugh
>
diff mbox series

Patch

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 95714fb28150..9279f6d478d9 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1166,7 +1166,20 @@  static struct page *dequeue_huge_page_vma(struct hstate *h,
 
 	gfp_mask = htlb_alloc_mask(h);
 	nid = huge_node(vma, address, gfp_mask, &mpol, &nodemask);
+#ifdef CONFIG_NUMA
+	if (mpol->mode == MPOL_PREFERRED_MANY) {
+		page = dequeue_huge_page_nodemask(h, gfp_mask, nid, nodemask);
+		if (page)
+			goto check_reserve;
+		/* Fallback to all nodes */
+		nodemask = NULL;
+	}
+#endif
 	page = dequeue_huge_page_nodemask(h, gfp_mask, nid, nodemask);
+
+#ifdef CONFIG_NUMA
+check_reserve:
+#endif
 	if (page && !avoid_reserve && vma_has_reserves(vma, chg)) {
 		SetHPageRestoreReserve(page);
 		h->resv_huge_pages--;
@@ -2147,6 +2160,21 @@  struct page *alloc_buddy_huge_page_with_mpol(struct hstate *h,
 	nodemask_t *nodemask;
 
 	nid = huge_node(vma, addr, gfp_mask, &mpol, &nodemask);
+#ifdef CONFIG_NUMA
+	if (mpol->mode == MPOL_PREFERRED_MANY) {
+		gfp_t gfp = gfp_mask | __GFP_NOWARN;
+
+		gfp &=  ~(__GFP_DIRECT_RECLAIM | __GFP_NOFAIL);
+		page = alloc_surplus_huge_page(h, gfp, nid, nodemask, false);
+		if (page) {
+			mpol_cond_put(mpol);
+			return page;
+		}
+
+		/* Fallback to all nodes */
+		nodemask = NULL;
+	}
+#endif
 	page = alloc_surplus_huge_page(h, gfp_mask, nid, nodemask, false);
 	mpol_cond_put(mpol);